Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: restore walkinrange optimization (by moving to SSA) #30645

Closed
josharian opened this issue Mar 6, 2019 · 18 comments
Closed

cmd/compile: restore walkinrange optimization (by moving to SSA) #30645

josharian opened this issue Mar 6, 2019 · 18 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@josharian
Copy link
Contributor

I’m on my phone, but I suspect that CL 165617 disabled the walkinrange optimization. This makes sense—order now rewrites away OANDAND and OOROR nodes, but those are what walkinrange operates on.

One symptom (I believe) is this:

https://go-review.googlesource.com/c/go/+/165617/1/test/checkbce.go#36

We have long wanted to move walkinrange to the SSA backend; this is a good motivation.

I’m marking this as a 1.13 release blocker, since IIRC walkinrange is a significant optimization for some hot code (usually parsing/formatting code looking for bytes in a range).

@josharian josharian added this to the Go1.13 milestone Mar 6, 2019
@josharian josharian added Performance release-blocker NeedsFix The path to resolution is known, but the work has not been done. labels Mar 6, 2019
@josharian
Copy link
Contributor Author

cc @randall77 @dr2chase

@josharian
Copy link
Contributor Author

(This is also an indicator that this optimization needs an explicit asm test.)

@ALTree ALTree changed the title cmd/compile: restore walkingrange optimization (by moving to SSA) cmd/compile: restore walkinrange optimization (by moving to SSA) Mar 7, 2019
@ALTree
Copy link
Member

ALTree commented Mar 7, 2019

I suspect that CL 165617 disabled the walkinrange optimization

It appears that's the case. Given

package p

func f(c uint64) bool {
	return 0x10 <= c && c < 0x20
}

on 178a2c4 (a little before that CL):

RETURN l(4) tc(1)
RETURN-list
.   AS l(4) tc(1)
.   .   NAME-p.~r1 a(true) g(1) l(3) x(8) class(PPARAMOUT) bool
.   .   LT l(4) tc(1) bool
.   .   .   CONVNOP l(4) tc(1) uint64
.   .   .   .   SUB l(4) tc(1) uint64
.   .   .   .   .   NAME-p.c a(true) g(2) l(3) x(0) class(PPARAM) tc(1) used uint64
.   .   .   .   .   LITERAL-16 l(4) tc(1) uint64
.   .   .   LITERAL-16 l(4) tc(1) uint64

on a2ace8e (after):

AS l(4) tc(1)
.   NAME-p..autotmp_2 a(true) l(4) x(0) class(PAUTO) esc(N) tc(1) assigned used bool
.   LE l(4) tc(1) bool
.   .   LITERAL-16 l(4) tc(1) uint64
.   .   NAME-p.c a(true) g(2) l(3) x(0) class(PPARAM) tc(1) used uint64

IF l(4)
.   NAME-p..autotmp_2 a(true) l(4) x(0) class(PAUTO) esc(N) tc(1) assigned used bool
IF-body
.   AS l(4) tc(1)
.   .   NAME-p..autotmp_2 a(true) l(4) x(0) class(PAUTO) esc(N) tc(1) assigned used bool
.   .   LT l(4) tc(1) bool
.   .   .   NAME-p.c a(true) g(2) l(3) x(0) class(PPARAM) tc(1) used uint64
.   .   .   LITERAL-32 l(4) tc(1) uint64

RETURN l(4) tc(1)
RETURN-list
.   AS l(4) tc(1)
.   .   NAME-p.~r1 a(true) g(1) l(3) x(8) class(PPARAMOUT) bool
.   .   NAME-p..autotmp_2 a(true) l(4) x(0) class(PAUTO) esc(N) tc(1) assigned used bool

VARKILL tc(1)
.   NAME-p..autotmp_2 a(true) l(4) x(0) class(PAUTO) esc(N) tc(1) assigned used bool

And the generated code contains two CMPQ instead of one.

@mundaym
Copy link
Member

mundaym commented Mar 7, 2019

Not sure if this is exactly what you're after but I've been tinkering with some simple logical optimizations for control flow graphs (it's very rough code at the moment, I was just experimenting, but it does work):

https://go-review.googlesource.com/c/go/+/165998/

This can do optimizations such as this for disjunctions (and similar optimizations for conjunctions):

x == 1 || x == 2 -> unsigned(x - 1) <= 1
x == 4 || x == 6 -> (x | 2) == 6
x != 0 || y != 0 -> (x | y) != 0

It also has some optimizations to clean up some uses of IsSliceInBounds which maybe shouldn't have been generated in the first place:

x >= 0 && IsSliceInBounds(x, y) -> IsSliceInBounds(x, y)

I scheduled it after prove since I was a bit worried it might obscure some facts.

Unfortunately it is quite unpleasant adding new optimizations, especially if you want to add handle multiple bit widths. So I was thinking of extending the rewrite rules a bit so you could write something like this and have a new mergeSuccessor function handle all the gnarly stuff around assessing cost/mergability and value/phi updating:

// x < 0 || y < 0 -> (x|y) < 0
(If (Less(64|32|16|8) x (Const(64|32|16|8) [0])) (If (Less(64|32|16|8) y (Const(64|32|16|8) [0]) yes no) no)
        && mergeSuccessor(b, 0)
        -> (If (Less(64|32|16|8) (Or(64|32|16|8) x y) (Const(64|32|16|8) [0]) yes no)

@josharian
Copy link
Contributor Author

Not sure if this is exactly what you're after

Yes, indeed, something like this.

I scheduled it after prove since I was a bit worried it might obscure some facts.

Makes sense. The right place to document that is an entry in passOrder.

Unfortunately it is quite unpleasant adding new optimizations

Indeed. That’s why I wrote this optimization during walk the first time around. :)

I was thinking of extending the rewrite rules a bit

I’m AFK for a while (which is also why I haven’t replied to your comments on the unsafe.Unreachable issue), and it’s not quite obvious to me whether this would work in practice. It’d be lovely if it did. Perhaps we could also move the CMOV optimizations into such a form as well.

In a pinch we could also create a new kind of rewrite rule + rulegen + runner for this kind of optimization, one geared specifically to this kind of optimization.

josharian added a commit to josharian/go that referenced this issue May 7, 2019
An acceptRange is used to check whether a byte falls in a range.
The existing code used the naive way: lo <= x <= hi.
Because bytes are unsigned, this can be rewritten as x-lo <= hi-lo.

The compiler does this optimization (modulo golang#30645),
but only when lo and hi are constants.
In this case, lo and hi are effectively constants,
but the compiler can't see that.

Change-Id: I9a4a87393915168bd59e28c8048e85e4c688aca5
@andybons
Copy link
Member

@josharian would you still like to do this for 1.13?

@josharian
Copy link
Contributor Author

I think this is important to do for 1.13, yes. It looked to me like maybe @mundaym was going to do it. I guess I'll take a look at it.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/178197 mentions this issue: cmd/compile: optimize If blocks with ConstBool phi args

@josharian
Copy link
Contributor Author

I'm going to punt this to 1.14 after all. I finally have a working version, based very loosely on @mundaym's work. But it is definitely non-trivial, and Keith would (very reasonably) prefer to wait for 1.14 for CL 178197, on which my CL depends. I'll see about cleaning up and mailing my CL sooner rather than later, so it can be ready when the tree re-opens or in case of need.

@josharian josharian modified the milestones: Go1.13, Go1.14 May 27, 2019
gopherbot pushed a commit that referenced this issue Aug 27, 2019
While working on #30645, I noticed that many instances
in which the walkinrange optimization could apply
were not even being considered.

This was because of extraneous blocks in the CFG,
of the type that shortcircuit normally removes.

The change improves the shortcircuit pass to handle
most of those cases. (There are a few that can only be
reasonably detected later in compilation, after other
optimizations have been run, but not enough to be worth chasing.)

Notable changes:

* Instead of calculating live-across-blocks values, use v.Uses == 1.
  This is cheaper and more straightforward.
  v.Uses did not exist when this pass was initially written.
* Incorporate a fusePlain and loop until stable.
  This is necessary to find many of the instances.
* Allow Copy and Not wrappers around Phi values.
  This significantly increases effectiveness.
* Allow removal of all preds, creating a dead block.
  The previous pass stopped unnecessarily at one pred.
* Use phielimValue during cleanup instead of manually
  setting the op to OpCopy.

The result is marginally faster compilation and smaller code.

name        old time/op       new time/op       delta
Template          213ms ± 2%        212ms ± 2%  -0.63%  (p=0.002 n=49+48)
Unicode          90.0ms ± 2%       89.8ms ± 2%    ~     (p=0.122 n=48+48)
GoTypes           710ms ± 3%        711ms ± 2%    ~     (p=0.433 n=45+49)
Compiler          3.23s ± 2%        3.22s ± 2%    ~     (p=0.124 n=47+49)
SSA               10.0s ± 1%        10.0s ± 1%  -0.43%  (p=0.000 n=48+50)
Flate             135ms ± 3%        135ms ± 2%    ~     (p=0.311 n=49+49)
GoParser          158ms ± 2%        158ms ± 2%    ~     (p=0.757 n=48+48)
Reflect           447ms ± 2%        447ms ± 2%    ~     (p=0.815 n=49+48)
Tar               189ms ± 2%        189ms ± 3%    ~     (p=0.530 n=47+49)
XML               251ms ± 3%        250ms ± 1%  -0.75%  (p=0.002 n=49+48)
[Geo mean]        427ms             426ms       -0.25%

name        old user-time/op  new user-time/op  delta
Template          265ms ± 2%        265ms ± 2%    ~     (p=0.969 n=48+50)
Unicode           119ms ± 6%        119ms ± 6%    ~     (p=0.738 n=50+50)
GoTypes           923ms ± 2%        925ms ± 2%    ~     (p=0.057 n=43+47)
Compiler          4.37s ± 2%        4.37s ± 2%    ~     (p=0.691 n=50+46)
SSA               13.4s ± 1%        13.4s ± 1%    ~     (p=0.282 n=42+49)
Flate             162ms ± 2%        162ms ± 2%    ~     (p=0.774 n=48+50)
GoParser          186ms ± 2%        186ms ± 3%    ~     (p=0.213 n=47+47)
Reflect           572ms ± 2%        573ms ± 3%    ~     (p=0.303 n=50+49)
Tar               240ms ± 3%        240ms ± 2%    ~     (p=0.939 n=46+44)
XML               302ms ± 2%        302ms ± 2%    ~     (p=0.399 n=47+47)
[Geo mean]        540ms             541ms       +0.07%

name        old alloc/op      new alloc/op      delta
Template         36.8MB ± 0%       36.7MB ± 0%  -0.42%  (p=0.008 n=5+5)
Unicode          28.1MB ± 0%       28.1MB ± 0%    ~     (p=0.151 n=5+5)
GoTypes           124MB ± 0%        124MB ± 0%  -0.26%  (p=0.008 n=5+5)
Compiler          571MB ± 0%        566MB ± 0%  -0.84%  (p=0.008 n=5+5)
SSA              1.86GB ± 0%       1.85GB ± 0%  -0.58%  (p=0.008 n=5+5)
Flate            22.8MB ± 0%       22.8MB ± 0%  -0.17%  (p=0.008 n=5+5)
GoParser         27.3MB ± 0%       27.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Reflect          79.5MB ± 0%       79.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Tar              34.7MB ± 0%       34.6MB ± 0%  -0.42%  (p=0.008 n=5+5)
XML              45.4MB ± 0%       45.3MB ± 0%  -0.29%  (p=0.008 n=5+5)
[Geo mean]       80.0MB            79.7MB       -0.34%

name        old allocs/op     new allocs/op     delta
Template           378k ± 0%         377k ± 0%  -0.22%  (p=0.008 n=5+5)
Unicode            339k ± 0%         339k ± 0%    ~     (p=0.643 n=5+5)
GoTypes           1.36M ± 0%        1.36M ± 0%  -0.10%  (p=0.008 n=5+5)
Compiler          5.51M ± 0%        5.50M ± 0%  -0.13%  (p=0.008 n=5+5)
SSA               17.5M ± 0%        17.5M ± 0%  -0.14%  (p=0.008 n=5+5)
Flate              234k ± 0%         234k ± 0%  -0.04%  (p=0.008 n=5+5)
GoParser           299k ± 0%         299k ± 0%  -0.05%  (p=0.008 n=5+5)
Reflect            978k ± 0%         979k ± 0%  +0.02%  (p=0.016 n=5+5)
Tar                351k ± 0%         351k ± 0%  -0.04%  (p=0.008 n=5+5)
XML                435k ± 0%         435k ± 0%  -0.11%  (p=0.008 n=5+5)
[Geo mean]         840k              840k       -0.08%

file      before    after     Δ       %
go        14794788  14770212  -24576  -0.166%
addr2line 4203688   4199592   -4096   -0.097%
api       5954056   5941768   -12288  -0.206%
asm       4862704   4846320   -16384  -0.337%
cgo       4778920   4770728   -8192   -0.171%
compile   24001568  23923792  -77776  -0.324%
cover     5198440   5190248   -8192   -0.158%
dist      3595248   3587056   -8192   -0.228%
doc       4618504   4610312   -8192   -0.177%
fix       3337416   3333320   -4096   -0.123%
link      6120408   6116312   -4096   -0.067%
nm        4149064   4140872   -8192   -0.197%
objdump   4555608   4547416   -8192   -0.180%
pprof     14616324  14595844  -20480  -0.140%
test2json 2766328   2762232   -4096   -0.148%
trace     11638844  11622460  -16384  -0.141%
vet       8274936   8258552   -16384  -0.198%
total     132520780 132270972 -249808 -0.189%

Change-Id: Ifcd235a2a6e5f13ed5c93e62523e2ef61321fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/178197
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
tomocy pushed a commit to tomocy/go that referenced this issue Sep 1, 2019
While working on golang#30645, I noticed that many instances
in which the walkinrange optimization could apply
were not even being considered.

This was because of extraneous blocks in the CFG,
of the type that shortcircuit normally removes.

The change improves the shortcircuit pass to handle
most of those cases. (There are a few that can only be
reasonably detected later in compilation, after other
optimizations have been run, but not enough to be worth chasing.)

Notable changes:

* Instead of calculating live-across-blocks values, use v.Uses == 1.
  This is cheaper and more straightforward.
  v.Uses did not exist when this pass was initially written.
* Incorporate a fusePlain and loop until stable.
  This is necessary to find many of the instances.
* Allow Copy and Not wrappers around Phi values.
  This significantly increases effectiveness.
* Allow removal of all preds, creating a dead block.
  The previous pass stopped unnecessarily at one pred.
* Use phielimValue during cleanup instead of manually
  setting the op to OpCopy.

The result is marginally faster compilation and smaller code.

name        old time/op       new time/op       delta
Template          213ms ± 2%        212ms ± 2%  -0.63%  (p=0.002 n=49+48)
Unicode          90.0ms ± 2%       89.8ms ± 2%    ~     (p=0.122 n=48+48)
GoTypes           710ms ± 3%        711ms ± 2%    ~     (p=0.433 n=45+49)
Compiler          3.23s ± 2%        3.22s ± 2%    ~     (p=0.124 n=47+49)
SSA               10.0s ± 1%        10.0s ± 1%  -0.43%  (p=0.000 n=48+50)
Flate             135ms ± 3%        135ms ± 2%    ~     (p=0.311 n=49+49)
GoParser          158ms ± 2%        158ms ± 2%    ~     (p=0.757 n=48+48)
Reflect           447ms ± 2%        447ms ± 2%    ~     (p=0.815 n=49+48)
Tar               189ms ± 2%        189ms ± 3%    ~     (p=0.530 n=47+49)
XML               251ms ± 3%        250ms ± 1%  -0.75%  (p=0.002 n=49+48)
[Geo mean]        427ms             426ms       -0.25%

name        old user-time/op  new user-time/op  delta
Template          265ms ± 2%        265ms ± 2%    ~     (p=0.969 n=48+50)
Unicode           119ms ± 6%        119ms ± 6%    ~     (p=0.738 n=50+50)
GoTypes           923ms ± 2%        925ms ± 2%    ~     (p=0.057 n=43+47)
Compiler          4.37s ± 2%        4.37s ± 2%    ~     (p=0.691 n=50+46)
SSA               13.4s ± 1%        13.4s ± 1%    ~     (p=0.282 n=42+49)
Flate             162ms ± 2%        162ms ± 2%    ~     (p=0.774 n=48+50)
GoParser          186ms ± 2%        186ms ± 3%    ~     (p=0.213 n=47+47)
Reflect           572ms ± 2%        573ms ± 3%    ~     (p=0.303 n=50+49)
Tar               240ms ± 3%        240ms ± 2%    ~     (p=0.939 n=46+44)
XML               302ms ± 2%        302ms ± 2%    ~     (p=0.399 n=47+47)
[Geo mean]        540ms             541ms       +0.07%

name        old alloc/op      new alloc/op      delta
Template         36.8MB ± 0%       36.7MB ± 0%  -0.42%  (p=0.008 n=5+5)
Unicode          28.1MB ± 0%       28.1MB ± 0%    ~     (p=0.151 n=5+5)
GoTypes           124MB ± 0%        124MB ± 0%  -0.26%  (p=0.008 n=5+5)
Compiler          571MB ± 0%        566MB ± 0%  -0.84%  (p=0.008 n=5+5)
SSA              1.86GB ± 0%       1.85GB ± 0%  -0.58%  (p=0.008 n=5+5)
Flate            22.8MB ± 0%       22.8MB ± 0%  -0.17%  (p=0.008 n=5+5)
GoParser         27.3MB ± 0%       27.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Reflect          79.5MB ± 0%       79.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Tar              34.7MB ± 0%       34.6MB ± 0%  -0.42%  (p=0.008 n=5+5)
XML              45.4MB ± 0%       45.3MB ± 0%  -0.29%  (p=0.008 n=5+5)
[Geo mean]       80.0MB            79.7MB       -0.34%

name        old allocs/op     new allocs/op     delta
Template           378k ± 0%         377k ± 0%  -0.22%  (p=0.008 n=5+5)
Unicode            339k ± 0%         339k ± 0%    ~     (p=0.643 n=5+5)
GoTypes           1.36M ± 0%        1.36M ± 0%  -0.10%  (p=0.008 n=5+5)
Compiler          5.51M ± 0%        5.50M ± 0%  -0.13%  (p=0.008 n=5+5)
SSA               17.5M ± 0%        17.5M ± 0%  -0.14%  (p=0.008 n=5+5)
Flate              234k ± 0%         234k ± 0%  -0.04%  (p=0.008 n=5+5)
GoParser           299k ± 0%         299k ± 0%  -0.05%  (p=0.008 n=5+5)
Reflect            978k ± 0%         979k ± 0%  +0.02%  (p=0.016 n=5+5)
Tar                351k ± 0%         351k ± 0%  -0.04%  (p=0.008 n=5+5)
XML                435k ± 0%         435k ± 0%  -0.11%  (p=0.008 n=5+5)
[Geo mean]         840k              840k       -0.08%

file      before    after     Δ       %
go        14794788  14770212  -24576  -0.166%
addr2line 4203688   4199592   -4096   -0.097%
api       5954056   5941768   -12288  -0.206%
asm       4862704   4846320   -16384  -0.337%
cgo       4778920   4770728   -8192   -0.171%
compile   24001568  23923792  -77776  -0.324%
cover     5198440   5190248   -8192   -0.158%
dist      3595248   3587056   -8192   -0.228%
doc       4618504   4610312   -8192   -0.177%
fix       3337416   3333320   -4096   -0.123%
link      6120408   6116312   -4096   -0.067%
nm        4149064   4140872   -8192   -0.197%
objdump   4555608   4547416   -8192   -0.180%
pprof     14616324  14595844  -20480  -0.140%
test2json 2766328   2762232   -4096   -0.148%
trace     11638844  11622460  -16384  -0.141%
vet       8274936   8258552   -16384  -0.198%
total     132520780 132270972 -249808 -0.189%

Change-Id: Ifcd235a2a6e5f13ed5c93e62523e2ef61321fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/178197
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
t4n6a1ka pushed a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
While working on golang#30645, I noticed that many instances
in which the walkinrange optimization could apply
were not even being considered.

This was because of extraneous blocks in the CFG,
of the type that shortcircuit normally removes.

The change improves the shortcircuit pass to handle
most of those cases. (There are a few that can only be
reasonably detected later in compilation, after other
optimizations have been run, but not enough to be worth chasing.)

Notable changes:

* Instead of calculating live-across-blocks values, use v.Uses == 1.
  This is cheaper and more straightforward.
  v.Uses did not exist when this pass was initially written.
* Incorporate a fusePlain and loop until stable.
  This is necessary to find many of the instances.
* Allow Copy and Not wrappers around Phi values.
  This significantly increases effectiveness.
* Allow removal of all preds, creating a dead block.
  The previous pass stopped unnecessarily at one pred.
* Use phielimValue during cleanup instead of manually
  setting the op to OpCopy.

The result is marginally faster compilation and smaller code.

name        old time/op       new time/op       delta
Template          213ms ± 2%        212ms ± 2%  -0.63%  (p=0.002 n=49+48)
Unicode          90.0ms ± 2%       89.8ms ± 2%    ~     (p=0.122 n=48+48)
GoTypes           710ms ± 3%        711ms ± 2%    ~     (p=0.433 n=45+49)
Compiler          3.23s ± 2%        3.22s ± 2%    ~     (p=0.124 n=47+49)
SSA               10.0s ± 1%        10.0s ± 1%  -0.43%  (p=0.000 n=48+50)
Flate             135ms ± 3%        135ms ± 2%    ~     (p=0.311 n=49+49)
GoParser          158ms ± 2%        158ms ± 2%    ~     (p=0.757 n=48+48)
Reflect           447ms ± 2%        447ms ± 2%    ~     (p=0.815 n=49+48)
Tar               189ms ± 2%        189ms ± 3%    ~     (p=0.530 n=47+49)
XML               251ms ± 3%        250ms ± 1%  -0.75%  (p=0.002 n=49+48)
[Geo mean]        427ms             426ms       -0.25%

name        old user-time/op  new user-time/op  delta
Template          265ms ± 2%        265ms ± 2%    ~     (p=0.969 n=48+50)
Unicode           119ms ± 6%        119ms ± 6%    ~     (p=0.738 n=50+50)
GoTypes           923ms ± 2%        925ms ± 2%    ~     (p=0.057 n=43+47)
Compiler          4.37s ± 2%        4.37s ± 2%    ~     (p=0.691 n=50+46)
SSA               13.4s ± 1%        13.4s ± 1%    ~     (p=0.282 n=42+49)
Flate             162ms ± 2%        162ms ± 2%    ~     (p=0.774 n=48+50)
GoParser          186ms ± 2%        186ms ± 3%    ~     (p=0.213 n=47+47)
Reflect           572ms ± 2%        573ms ± 3%    ~     (p=0.303 n=50+49)
Tar               240ms ± 3%        240ms ± 2%    ~     (p=0.939 n=46+44)
XML               302ms ± 2%        302ms ± 2%    ~     (p=0.399 n=47+47)
[Geo mean]        540ms             541ms       +0.07%

name        old alloc/op      new alloc/op      delta
Template         36.8MB ± 0%       36.7MB ± 0%  -0.42%  (p=0.008 n=5+5)
Unicode          28.1MB ± 0%       28.1MB ± 0%    ~     (p=0.151 n=5+5)
GoTypes           124MB ± 0%        124MB ± 0%  -0.26%  (p=0.008 n=5+5)
Compiler          571MB ± 0%        566MB ± 0%  -0.84%  (p=0.008 n=5+5)
SSA              1.86GB ± 0%       1.85GB ± 0%  -0.58%  (p=0.008 n=5+5)
Flate            22.8MB ± 0%       22.8MB ± 0%  -0.17%  (p=0.008 n=5+5)
GoParser         27.3MB ± 0%       27.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Reflect          79.5MB ± 0%       79.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Tar              34.7MB ± 0%       34.6MB ± 0%  -0.42%  (p=0.008 n=5+5)
XML              45.4MB ± 0%       45.3MB ± 0%  -0.29%  (p=0.008 n=5+5)
[Geo mean]       80.0MB            79.7MB       -0.34%

name        old allocs/op     new allocs/op     delta
Template           378k ± 0%         377k ± 0%  -0.22%  (p=0.008 n=5+5)
Unicode            339k ± 0%         339k ± 0%    ~     (p=0.643 n=5+5)
GoTypes           1.36M ± 0%        1.36M ± 0%  -0.10%  (p=0.008 n=5+5)
Compiler          5.51M ± 0%        5.50M ± 0%  -0.13%  (p=0.008 n=5+5)
SSA               17.5M ± 0%        17.5M ± 0%  -0.14%  (p=0.008 n=5+5)
Flate              234k ± 0%         234k ± 0%  -0.04%  (p=0.008 n=5+5)
GoParser           299k ± 0%         299k ± 0%  -0.05%  (p=0.008 n=5+5)
Reflect            978k ± 0%         979k ± 0%  +0.02%  (p=0.016 n=5+5)
Tar                351k ± 0%         351k ± 0%  -0.04%  (p=0.008 n=5+5)
XML                435k ± 0%         435k ± 0%  -0.11%  (p=0.008 n=5+5)
[Geo mean]         840k              840k       -0.08%

file      before    after     Δ       %
go        14794788  14770212  -24576  -0.166%
addr2line 4203688   4199592   -4096   -0.097%
api       5954056   5941768   -12288  -0.206%
asm       4862704   4846320   -16384  -0.337%
cgo       4778920   4770728   -8192   -0.171%
compile   24001568  23923792  -77776  -0.324%
cover     5198440   5190248   -8192   -0.158%
dist      3595248   3587056   -8192   -0.228%
doc       4618504   4610312   -8192   -0.177%
fix       3337416   3333320   -4096   -0.123%
link      6120408   6116312   -4096   -0.067%
nm        4149064   4140872   -8192   -0.197%
objdump   4555608   4547416   -8192   -0.180%
pprof     14616324  14595844  -20480  -0.140%
test2json 2766328   2762232   -4096   -0.148%
trace     11638844  11622460  -16384  -0.141%
vet       8274936   8258552   -16384  -0.198%
total     132520780 132270972 -249808 -0.189%

Change-Id: Ifcd235a2a6e5f13ed5c93e62523e2ef61321fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/178197
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@randall77
Copy link
Contributor

@mundaym @josharian Were either of you guys planning on submitting something here?

@mundaym
Copy link
Member

mundaym commented Oct 15, 2019

I'd like to make block rewrite rules more powerful but I don't think I'll be able to get it done before the freeze because it is quite fiddly (happy to help if someone else wants to take a look). I'm happy to write a pass to do this specific optimization though if that would be useful. Most of the tricky stuff I worked through in the CL I posted previously.

@randall77
Copy link
Contributor

Just finishing your CL for this specific optimization would be good.

@josharian
Copy link
Contributor Author

I will also dig up my work in progress and mail it, in case it is helpful. Probably this afternoon.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/201206 mentions this issue: cmd/compile: optimize integer-in-range checks

@josharian
Copy link
Contributor Author

Mailed CL 201206 in the hopes that it is helpful. I only vaguely remember where I left it, but I'm pretty sure it's not done. Sorry.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/165998 mentions this issue: cmd/compile: optimize integer-in-range checks

@randall77
Copy link
Contributor

I'm going to punt this to 1.15. There is a CL, but it could use some more work, it's late, and it is just a performance improvement.

@randall77 randall77 modified the milestones: Go1.14, Go1.15 Nov 12, 2019
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/221802 mentions this issue: cmd/compile: remove walkinrange optimization

gopherbot pushed a commit that referenced this issue Mar 3, 2020
The walkinrange optimization has been superseded by CL 165998.

Has a small positive impact on binary sizes:

compilecmp master -> HEAD
master (e37cc29): cmd/compile: optimize integer-in-range checks
HEAD (1a70680a34): cmd/compile: remove walkinrange optimization
platform: linux/amd64

file      before    after     Δ       %
addr2line 4329144   4325048   -4096   -0.095%
api       6060970   6056874   -4096   -0.068%
asm       5196905   5192809   -4096   -0.079%
cgo       4898769   4890577   -8192   -0.167%
compile   20222193  20209713  -12480  -0.062%
cover     5331580   5323388   -8192   -0.154%
dist      3732778   3728682   -4096   -0.110%
doc       4748488   4740296   -8192   -0.173%
link      6707380   6695092   -12288  -0.183%
nm        4278685   4274589   -4096   -0.096%
pack      2305038   2300942   -4096   -0.178%
pprof     14874834  14870738  -4096   -0.028%
test2json 2849221   2845125   -4096   -0.144%
vet       8393173   8384981   -8192   -0.098%
go        15205572  15193284  -12288  -0.081%
total     131812292 131709700 -102592 -0.078%

Updates #30645.

Change-Id: I42d74481652c90fef1a9bc58c70836e42c9b1c4b
Reviewed-on: https://go-review.googlesource.com/c/go/+/221802
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
@golang golang locked and limited conversation to collaborators Mar 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

6 participants