Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: unneeded MOV's caused by channel operations #35969

Open
mariecurried opened this issue Dec 4, 2019 · 3 comments
Open

cmd/compile: unneeded MOV's caused by channel operations #35969

mariecurried opened this issue Dec 4, 2019 · 3 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@mariecurried
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.13 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What did you do?

I compiled the following function:

package test

func pingPong(c chan int) {
    for {
        x := <- c
        c <- x
    }
}

What did you expect to see?

I expected that the generated code would not contain unnecessary instructions.

What did you see instead?

Instead, there are a couple of move instructions that don't accomplish anything (lines 25-26 in https://godbolt.org/z/RnTRWz).

@randall77 randall77 added this to the Unplanned milestone Dec 4, 2019
@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 5, 2019
@ivzhh
Copy link

ivzhh commented Dec 16, 2019

The syntax <- chan will be translated to runtime call: func chanrecv1(c *hchan, elem unsafe.Pointer) . The address of the receiver will be passed as the second argument.

x := <- c will create ORECV node; later in order.go, temporary variable .autotmp_2
is created and the AST will be translated as:

var .autotmp_2 int
chanrecv1(c, &.autotmp_2)
x := *(&.autotmp_2)

order.go issues VarKill and Vardef for this temporary variable .autotmp_2.

The SSA result in Opt pass:

v18 (+5) = StaticCall <mem> {runtime.chanrecv1} [16] v17
v19 (5) = LocalAddr <*int> {.autotmp_2} v2 v18
v20 (5) = Load <int> v19 v18 (x[int])
v21 (4) = VarKill <mem> {.autotmp_2} v18
v22 (6) = VarDef <mem> {.autotmp_2} v21
v23 (6) = LocalAddr <*int> {.autotmp_2} v2 v22
v24 (6) = Store <mem> {int} v23 v20 v22

The cmd/compile/internal/ssa/gen/generic.rules contains rules to remove extra store/load:

Eliminate stores of values that have just been loaded from the same location.

However, the (Store (LocalAddr (... pattern is not included in generic.rules.

@ivzhh
Copy link

ivzhh commented Dec 21, 2019

I have a fix now. Will submit to CL soon.

ivzhh added a commit to ivzhh/go that referenced this issue Dec 21, 2019
This commit optimizes extra MOVs, created by variable
definition and channel receiving <- operation.

The syntax <- chan will be translated to runtime call:
func chanrecv1(c *hchan, elem unsafe.Pointer).
Therefore, syntax <- c creates an addressable temporary variable,
and the address of this temporary variable is passed as the
second argument. order.go issues VarKill and Vardef
for this temporary variable.

So syntax x := <- c; c <- x will do it in three steps:

* pass address of temporary variable to chanrecv1 and call
* load value of the variable into register and mark death of
temporary variable
* mark creation of variable x and store value into x

The relevant SSA result are as following:

v18 (+5) = StaticCall <mem> {runtime.chanrecv1} [16] v17
v19 (5) = LocalAddr <*int> {.autotmp_2} v2 v18
v20 (5) = Load <int> v19 v18 (x[int])
v21 (4) = VarKill <mem> {.autotmp_2} v18
v22 (6) = VarDef <mem> {.autotmp_2} v21
v23 (6) = LocalAddr <*int> {.autotmp_2} v2 v22
v24 (6) = Store <mem> {int} v23 v20 v22

This commit finds this temporary variable and extends the lifetime
of this temporary variable.
The same address is killed by VarKill and recreated by
VarDef. The two operations are eliminated.

A benchmark is performed on $GOROOT/test/bench/go1. The results are
compared between:

* old: go version devel +3f51350c70
* new: +3f51350c70 + this patch

name                      old time/op    new time/op    delta
BinaryTree17-12              1.96s ± 0%     1.96s ± 0%   ~     (p=1.000 n=1+1)
Fannkuch11-12                1.78s ± 0%     1.78s ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfEmpty-12          25.4ns ± 0%    25.5ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfString-12         44.3ns ± 0%    43.3ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfInt-12            48.6ns ± 0%    49.1ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfIntInt-12         77.0ns ± 0%    79.2ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfPrefixedInt-12    81.9ns ± 0%    83.1ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfFloat-12           127ns ± 0%     127ns ± 0%   ~     (all equal)
FmtManyArgs-12               312ns ± 0%     310ns ± 0%   ~     (p=1.000 n=1+1)
GobDecode-12                3.27ms ± 0%    3.28ms ± 0%   ~     (p=1.000 n=1+1)
GobEncode-12                2.70ms ± 0%    2.66ms ± 0%   ~     (p=1.000 n=1+1)
Gzip-12                      137ms ± 0%     138ms ± 0%   ~     (p=1.000 n=1+1)
Gunzip-12                   21.4ms ± 0%    22.1ms ± 0%   ~     (p=1.000 n=1+1)
HTTPClientServer-12         40.4µs ± 0%    40.3µs ± 0%   ~     (p=1.000 n=1+1)
JSONEncode-12               5.53ms ± 0%    5.59ms ± 0%   ~     (p=1.000 n=1+1)
JSONDecode-12               24.7ms ± 0%    24.6ms ± 0%   ~     (p=1.000 n=1+1)
Mandelbrot200-12            2.84ms ± 0%    2.84ms ± 0%   ~     (p=1.000 n=1+1)
GoParse-12                  2.02ms ± 0%    1.99ms ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_32-12      40.6ns ± 0%    41.5ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_1K-12       108ns ± 0%     111ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_32-12      36.0ns ± 0%    36.5ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_1K-12       186ns ± 0%     185ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_32-12     3.56ns ± 0%    3.56ns ± 0%   ~     (all equal)
RegexpMatchMedium_1K-12     20.8µs ± 0%    20.5µs ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_32-12        984ns ± 0%     961ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_1K-12       28.8µs ± 0%    28.8µs ± 0%   ~     (p=1.000 n=1+1)
Revcomp-12                   276ms ± 0%     281ms ± 0%   ~     (p=1.000 n=1+1)
Template-12                 32.0ms ± 0%    31.4ms ± 0%   ~     (p=1.000 n=1+1)
TimeParse-12                 204ns ± 0%     202ns ± 0%   ~     (p=1.000 n=1+1)
TimeFormat-12                187ns ± 0%     188ns ± 0%   ~     (p=1.000 n=1+1)

name                      old speed      new speed      delta
GobDecode-12               235MB/s ± 0%   234MB/s ± 0%   ~     (p=1.000 n=1+1)
GobEncode-12               284MB/s ± 0%   288MB/s ± 0%   ~     (p=1.000 n=1+1)
Gzip-12                    142MB/s ± 0%   141MB/s ± 0%   ~     (p=1.000 n=1+1)
Gunzip-12                  906MB/s ± 0%   879MB/s ± 0%   ~     (p=1.000 n=1+1)
JSONEncode-12              351MB/s ± 0%   347MB/s ± 0%   ~     (p=1.000 n=1+1)
JSONDecode-12             78.6MB/s ± 0%  78.8MB/s ± 0%   ~     (p=1.000 n=1+1)
GoParse-12                28.7MB/s ± 0%  29.1MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_32-12     788MB/s ± 0%   771MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_1K-12    9.44GB/s ± 0%  9.26GB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_32-12     888MB/s ± 0%   877MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_1K-12    5.50GB/s ± 0%  5.54GB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_32-12    281MB/s ± 0%   281MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_1K-12   49.3MB/s ± 0%  50.0MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_32-12     32.5MB/s ± 0%  33.3MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_1K-12     35.5MB/s ± 0%  35.6MB/s ± 0%   ~     (p=1.000 n=1+1)
Revcomp-12                 921MB/s ± 0%   905MB/s ± 0%   ~     (p=1.000 n=1+1)
Template-12               60.6MB/s ± 0%  61.8MB/s ± 0%   ~     (p=1.000 n=1+1)

Fixes golang#35969
@gopherbot
Copy link

Change https://golang.org/cl/212303 mentions this issue: cmd/compile: remove unnecessary MOV in channel receiving

ivzhh added a commit to ivzhh/go that referenced this issue Dec 22, 2019
This commit optimizes extra MOVs, created by variable
definition and channel receiving <- operation.

The syntax <- chan will be translated to runtime call:
func chanrecv1(c *hchan, elem unsafe.Pointer).
Therefore, syntax <- c creates an addressable temporary variable,
and the address of this temporary variable is passed as the
second argument. order.go issues VarKill and Vardef
for this temporary variable.

So syntax x := <- c; c <- x will do it in three steps:

* pass address of temporary variable to chanrecv1 and call
* load value of the variable into register and mark death of
temporary variable
* mark creation of variable x and store value into x

The relevant SSA result are as following:

v18 (+5) = StaticCall <mem> {runtime.chanrecv1} [16] v17
v19 (5) = LocalAddr <*int> {.autotmp_2} v2 v18
v20 (5) = Load <int> v19 v18 (x[int])
v21 (4) = VarKill <mem> {.autotmp_2} v18
v22 (6) = VarDef <mem> {.autotmp_2} v21
v23 (6) = LocalAddr <*int> {.autotmp_2} v2 v22
v24 (6) = Store <mem> {int} v23 v20 v22

This commit finds this temporary variable and extends the lifetime
of this temporary variable.
The same address is killed by VarKill and recreated by
VarDef. The two operations are eliminated.

A benchmark is performed on $GOROOT/test/bench/go1. The results are
compared between:

* old: go version devel +3f51350c70
* new: +3f51350c70 + this patch

name                      old time/op    new time/op    delta
BinaryTree17-12              1.96s ± 0%     1.96s ± 0%   ~     (p=1.000 n=1+1)
Fannkuch11-12                1.78s ± 0%     1.78s ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfEmpty-12          25.4ns ± 0%    25.5ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfString-12         44.3ns ± 0%    43.3ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfInt-12            48.6ns ± 0%    49.1ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfIntInt-12         77.0ns ± 0%    79.2ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfPrefixedInt-12    81.9ns ± 0%    83.1ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfFloat-12           127ns ± 0%     127ns ± 0%   ~     (all equal)
FmtManyArgs-12               312ns ± 0%     310ns ± 0%   ~     (p=1.000 n=1+1)
GobDecode-12                3.27ms ± 0%    3.28ms ± 0%   ~     (p=1.000 n=1+1)
GobEncode-12                2.70ms ± 0%    2.66ms ± 0%   ~     (p=1.000 n=1+1)
Gzip-12                      137ms ± 0%     138ms ± 0%   ~     (p=1.000 n=1+1)
Gunzip-12                   21.4ms ± 0%    22.1ms ± 0%   ~     (p=1.000 n=1+1)
HTTPClientServer-12         40.4µs ± 0%    40.3µs ± 0%   ~     (p=1.000 n=1+1)
JSONEncode-12               5.53ms ± 0%    5.59ms ± 0%   ~     (p=1.000 n=1+1)
JSONDecode-12               24.7ms ± 0%    24.6ms ± 0%   ~     (p=1.000 n=1+1)
Mandelbrot200-12            2.84ms ± 0%    2.84ms ± 0%   ~     (p=1.000 n=1+1)
GoParse-12                  2.02ms ± 0%    1.99ms ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_32-12      40.6ns ± 0%    41.5ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_1K-12       108ns ± 0%     111ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_32-12      36.0ns ± 0%    36.5ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_1K-12       186ns ± 0%     185ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_32-12     3.56ns ± 0%    3.56ns ± 0%   ~     (all equal)
RegexpMatchMedium_1K-12     20.8µs ± 0%    20.5µs ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_32-12        984ns ± 0%     961ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_1K-12       28.8µs ± 0%    28.8µs ± 0%   ~     (p=1.000 n=1+1)
Revcomp-12                   276ms ± 0%     281ms ± 0%   ~     (p=1.000 n=1+1)
Template-12                 32.0ms ± 0%    31.4ms ± 0%   ~     (p=1.000 n=1+1)
TimeParse-12                 204ns ± 0%     202ns ± 0%   ~     (p=1.000 n=1+1)
TimeFormat-12                187ns ± 0%     188ns ± 0%   ~     (p=1.000 n=1+1)

name                      old speed      new speed      delta
GobDecode-12               235MB/s ± 0%   234MB/s ± 0%   ~     (p=1.000 n=1+1)
GobEncode-12               284MB/s ± 0%   288MB/s ± 0%   ~     (p=1.000 n=1+1)
Gzip-12                    142MB/s ± 0%   141MB/s ± 0%   ~     (p=1.000 n=1+1)
Gunzip-12                  906MB/s ± 0%   879MB/s ± 0%   ~     (p=1.000 n=1+1)
JSONEncode-12              351MB/s ± 0%   347MB/s ± 0%   ~     (p=1.000 n=1+1)
JSONDecode-12             78.6MB/s ± 0%  78.8MB/s ± 0%   ~     (p=1.000 n=1+1)
GoParse-12                28.7MB/s ± 0%  29.1MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_32-12     788MB/s ± 0%   771MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_1K-12    9.44GB/s ± 0%  9.26GB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_32-12     888MB/s ± 0%   877MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_1K-12    5.50GB/s ± 0%  5.54GB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_32-12    281MB/s ± 0%   281MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_1K-12   49.3MB/s ± 0%  50.0MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_32-12     32.5MB/s ± 0%  33.3MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_1K-12     35.5MB/s ± 0%  35.6MB/s ± 0%   ~     (p=1.000 n=1+1)
Revcomp-12                 921MB/s ± 0%   905MB/s ± 0%   ~     (p=1.000 n=1+1)
Template-12               60.6MB/s ± 0%  61.8MB/s ± 0%   ~     (p=1.000 n=1+1)

Fixes golang#35969
ivzhh added a commit to ivzhh/go that referenced this issue Dec 22, 2019
This commit optimizes extra MOVs, created by variable
definition and channel receiving <- operation.

The syntax <- chan will be translated to runtime call:
func chanrecv1(c *hchan, elem unsafe.Pointer).
Therefore, syntax <- c creates an addressable temporary variable,
and the address of this temporary variable is passed as the
second argument. order.go issues VarKill and Vardef
for this temporary variable.

So syntax x := <- c; c <- x will do it in three steps:

* pass address of temporary variable to chanrecv1 and call
* load value of the variable into register and mark death of
temporary variable
* mark creation of variable x and store value into x

The relevant SSA result are as following:

v18 (+5) = StaticCall <mem> {runtime.chanrecv1} [16] v17
v19 (5) = LocalAddr <*int> {.autotmp_2} v2 v18
v20 (5) = Load <int> v19 v18 (x[int])
v21 (4) = VarKill <mem> {.autotmp_2} v18
v22 (6) = VarDef <mem> {.autotmp_2} v21
v23 (6) = LocalAddr <*int> {.autotmp_2} v2 v22
v24 (6) = Store <mem> {int} v23 v20 v22

This commit finds this temporary variable and extends the lifetime
of this temporary variable.
The same address is killed by VarKill and recreated by
VarDef. The two operations are eliminated.

InlMark is a special case: it won't return a dummy mem,
which can be used by next SSA. This can cause
a failure in the following case during pass ssa/schedule.go:

v185 (864) = InterCall <mem> [32] v180 v184
v187 (864) = Load <Range> v186 v185 (r[Range])
v188 (864) = LocalAddr <*ssa.Value> {.autotmp_18} v2 v185
v189 (864) = Load <ssa.Value> v188 v185 (x[ssa.Value])
v190 (863) = VarKill <mem> {.autotmp_18} v185
v191 (864) = InlMark <void> [414] v190
v192 (708) = VarDef <mem> {.autotmp_18} v190
v193 (708) = LocalAddr <*ssa.Value> {.autotmp_18} v2 v192
v194 (708) = Store <mem> {ssa.Value} v193 v189 v192

The InlMark (v191) keeps v190 alive and therefore v190 (VarKill)
is kept, while corresponding Vardef is eliminated.
The dummy mem() is taken squentially, thus let InlMark use
the mem() before VarKill.

A regex search ^\s+s\.newValue is used to find those cases
that returned value of s.newValue*() is not utilized.
The only two matches are InlMark and NilCheck.
However, NilCheck cannot work with this fix (unknown reason,
error in building go toolchain).

To validate this fix, the following tests have been done:

* build go on Linux/amd64 (all.bash)
* build go on darwin64 (all.bash)
* build honnef.co/go/tools@v0.0.1-2019.2.3 on Linux/amd64
* build and pass tests for prometheus
* build and pass tests pingcap/tidb

A benchmark is performed on $GOROOT/test/bench/go1. The results are
compared between:

* old: go version devel +3f51350c70
* new: +3f51350c70 + this patch

name                      old time/op    new time/op    delta
BinaryTree17-12              1.96s ± 0%     1.96s ± 0%   ~     (p=1.000 n=1+1)
Fannkuch11-12                1.78s ± 0%     1.78s ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfEmpty-12          25.4ns ± 0%    25.5ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfString-12         44.3ns ± 0%    43.3ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfInt-12            48.6ns ± 0%    49.1ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfIntInt-12         77.0ns ± 0%    79.2ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfPrefixedInt-12    81.9ns ± 0%    83.1ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfFloat-12           127ns ± 0%     127ns ± 0%   ~     (all equal)
FmtManyArgs-12               312ns ± 0%     310ns ± 0%   ~     (p=1.000 n=1+1)
GobDecode-12                3.27ms ± 0%    3.28ms ± 0%   ~     (p=1.000 n=1+1)
GobEncode-12                2.70ms ± 0%    2.66ms ± 0%   ~     (p=1.000 n=1+1)
Gzip-12                      137ms ± 0%     138ms ± 0%   ~     (p=1.000 n=1+1)
Gunzip-12                   21.4ms ± 0%    22.1ms ± 0%   ~     (p=1.000 n=1+1)
HTTPClientServer-12         40.4µs ± 0%    40.3µs ± 0%   ~     (p=1.000 n=1+1)
JSONEncode-12               5.53ms ± 0%    5.59ms ± 0%   ~     (p=1.000 n=1+1)
JSONDecode-12               24.7ms ± 0%    24.6ms ± 0%   ~     (p=1.000 n=1+1)
Mandelbrot200-12            2.84ms ± 0%    2.84ms ± 0%   ~     (p=1.000 n=1+1)
GoParse-12                  2.02ms ± 0%    1.99ms ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_32-12      40.6ns ± 0%    41.5ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_1K-12       108ns ± 0%     111ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_32-12      36.0ns ± 0%    36.5ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_1K-12       186ns ± 0%     185ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_32-12     3.56ns ± 0%    3.56ns ± 0%   ~     (all equal)
RegexpMatchMedium_1K-12     20.8µs ± 0%    20.5µs ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_32-12        984ns ± 0%     961ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_1K-12       28.8µs ± 0%    28.8µs ± 0%   ~     (p=1.000 n=1+1)
Revcomp-12                   276ms ± 0%     281ms ± 0%   ~     (p=1.000 n=1+1)
Template-12                 32.0ms ± 0%    31.4ms ± 0%   ~     (p=1.000 n=1+1)
TimeParse-12                 204ns ± 0%     202ns ± 0%   ~     (p=1.000 n=1+1)
TimeFormat-12                187ns ± 0%     188ns ± 0%   ~     (p=1.000 n=1+1)

name                      old speed      new speed      delta
GobDecode-12               235MB/s ± 0%   234MB/s ± 0%   ~     (p=1.000 n=1+1)
GobEncode-12               284MB/s ± 0%   288MB/s ± 0%   ~     (p=1.000 n=1+1)
Gzip-12                    142MB/s ± 0%   141MB/s ± 0%   ~     (p=1.000 n=1+1)
Gunzip-12                  906MB/s ± 0%   879MB/s ± 0%   ~     (p=1.000 n=1+1)
JSONEncode-12              351MB/s ± 0%   347MB/s ± 0%   ~     (p=1.000 n=1+1)
JSONDecode-12             78.6MB/s ± 0%  78.8MB/s ± 0%   ~     (p=1.000 n=1+1)
GoParse-12                28.7MB/s ± 0%  29.1MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_32-12     788MB/s ± 0%   771MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_1K-12    9.44GB/s ± 0%  9.26GB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_32-12     888MB/s ± 0%   877MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_1K-12    5.50GB/s ± 0%  5.54GB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_32-12    281MB/s ± 0%   281MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_1K-12   49.3MB/s ± 0%  50.0MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_32-12     32.5MB/s ± 0%  33.3MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_1K-12     35.5MB/s ± 0%  35.6MB/s ± 0%   ~     (p=1.000 n=1+1)
Revcomp-12                 921MB/s ± 0%   905MB/s ± 0%   ~     (p=1.000 n=1+1)
Template-12               60.6MB/s ± 0%  61.8MB/s ± 0%   ~     (p=1.000 n=1+1)

Fixes golang#35969
ivzhh added a commit to ivzhh/go that referenced this issue Dec 29, 2019
This commit optimizes extra MOVs, created by variable
definition and channel receiving <- operation.

The syntax <- chan will be translated to runtime call:
func chanrecv1(c *hchan, elem unsafe.Pointer).
Therefore, syntax <- c creates an addressable temporary variable,
and the address of this temporary variable is passed as the
second argument. order.go issues VarKill and Vardef
for this temporary variable.

So syntax x := <- c; c <- x will do it in three steps:

* pass address of temporary variable to chanrecv1 and call
* load value of the variable into register and mark death of
temporary variable
* mark creation of variable x and store value into x

The relevant SSA result are as following:

v18 (+5) = StaticCall <mem> {runtime.chanrecv1} [16] v17
v19 (5) = LocalAddr <*int> {.autotmp_2} v2 v18
v20 (5) = Load <int> v19 v18 (x[int])
v21 (4) = VarKill <mem> {.autotmp_2} v18
v22 (6) = VarDef <mem> {.autotmp_2} v21
v23 (6) = LocalAddr <*int> {.autotmp_2} v2 v22
v24 (6) = Store <mem> {int} v23 v20 v22

This commit finds this temporary variable and extends the lifetime
of this temporary variable.
The same address is killed by VarKill and recreated by
VarDef. The two operations are eliminated.

However, there is a special case to prevent this rule from being
applied. SSA values' with "typ void" as type can refer to
VarKill, which can lead to a result that VarKill is kept and
VarDef is removed.

The solution is to limit the Uses count on the VarDef and VarKill,
because their side-effect changes the liveness of variables.
VarDef must be referred twice: LocalAddr and Store in the same
pattern; VarKill must be only referred by VarKill because it is
a temporary variable.

To validate this fix, the following tests have been done:

* build go on Linux/amd64 (all.bash)
* build go on darwin64 (all.bash)
* build go on windows/amd64 (all.bat, CGO_ENABLED=0)
* build honnef.co/go/tools@v0.0.1-2019.2.3 on Linux/amd64
* build and pass tests for prometheus
* build and pass tests pingcap/tidb

A benchmark is performed on $GOROOT/test/bench/go1. The results are
compared between:

* old: go version devel +3f51350c70
* new: +3f51350c70 + this patch

name                      old time/op    new time/op    delta
BinaryTree17-12              1.96s ± 0%     1.96s ± 0%   ~     (p=1.000 n=1+1)
Fannkuch11-12                1.78s ± 0%     1.78s ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfEmpty-12          25.4ns ± 0%    25.5ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfString-12         44.3ns ± 0%    43.3ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfInt-12            48.6ns ± 0%    49.1ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfIntInt-12         77.0ns ± 0%    79.2ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfPrefixedInt-12    81.9ns ± 0%    83.1ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfFloat-12           127ns ± 0%     127ns ± 0%   ~     (all equal)
FmtManyArgs-12               312ns ± 0%     310ns ± 0%   ~     (p=1.000 n=1+1)
GobDecode-12                3.27ms ± 0%    3.28ms ± 0%   ~     (p=1.000 n=1+1)
GobEncode-12                2.70ms ± 0%    2.66ms ± 0%   ~     (p=1.000 n=1+1)
Gzip-12                      137ms ± 0%     138ms ± 0%   ~     (p=1.000 n=1+1)
Gunzip-12                   21.4ms ± 0%    22.1ms ± 0%   ~     (p=1.000 n=1+1)
HTTPClientServer-12         40.4µs ± 0%    40.3µs ± 0%   ~     (p=1.000 n=1+1)
JSONEncode-12               5.53ms ± 0%    5.59ms ± 0%   ~     (p=1.000 n=1+1)
JSONDecode-12               24.7ms ± 0%    24.6ms ± 0%   ~     (p=1.000 n=1+1)
Mandelbrot200-12            2.84ms ± 0%    2.84ms ± 0%   ~     (p=1.000 n=1+1)
GoParse-12                  2.02ms ± 0%    1.99ms ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_32-12      40.6ns ± 0%    41.5ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_1K-12       108ns ± 0%     111ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_32-12      36.0ns ± 0%    36.5ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_1K-12       186ns ± 0%     185ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_32-12     3.56ns ± 0%    3.56ns ± 0%   ~     (all equal)
RegexpMatchMedium_1K-12     20.8µs ± 0%    20.5µs ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_32-12        984ns ± 0%     961ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_1K-12       28.8µs ± 0%    28.8µs ± 0%   ~     (p=1.000 n=1+1)
Revcomp-12                   276ms ± 0%     281ms ± 0%   ~     (p=1.000 n=1+1)
Template-12                 32.0ms ± 0%    31.4ms ± 0%   ~     (p=1.000 n=1+1)
TimeParse-12                 204ns ± 0%     202ns ± 0%   ~     (p=1.000 n=1+1)
TimeFormat-12                187ns ± 0%     188ns ± 0%   ~     (p=1.000 n=1+1)

name                      old speed      new speed      delta
GobDecode-12               235MB/s ± 0%   234MB/s ± 0%   ~     (p=1.000 n=1+1)
GobEncode-12               284MB/s ± 0%   288MB/s ± 0%   ~     (p=1.000 n=1+1)
Gzip-12                    142MB/s ± 0%   141MB/s ± 0%   ~     (p=1.000 n=1+1)
Gunzip-12                  906MB/s ± 0%   879MB/s ± 0%   ~     (p=1.000 n=1+1)
JSONEncode-12              351MB/s ± 0%   347MB/s ± 0%   ~     (p=1.000 n=1+1)
JSONDecode-12             78.6MB/s ± 0%  78.8MB/s ± 0%   ~     (p=1.000 n=1+1)
GoParse-12                28.7MB/s ± 0%  29.1MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_32-12     788MB/s ± 0%   771MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_1K-12    9.44GB/s ± 0%  9.26GB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_32-12     888MB/s ± 0%   877MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_1K-12    5.50GB/s ± 0%  5.54GB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_32-12    281MB/s ± 0%   281MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_1K-12   49.3MB/s ± 0%  50.0MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_32-12     32.5MB/s ± 0%  33.3MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_1K-12     35.5MB/s ± 0%  35.6MB/s ± 0%   ~     (p=1.000 n=1+1)
Revcomp-12                 921MB/s ± 0%   905MB/s ± 0%   ~     (p=1.000 n=1+1)
Template-12               60.6MB/s ± 0%  61.8MB/s ± 0%   ~     (p=1.000 n=1+1)

Fixes golang#35969
ivzhh added a commit to ivzhh/go that referenced this issue Dec 29, 2019
This commit optimizes extra MOVs, created by variable
definition and channel receiving <- operation.

The syntax <- chan will be translated to runtime call:
func chanrecv1(c *hchan, elem unsafe.Pointer).
Therefore, syntax <- c creates an addressable temporary variable,
and the address of this temporary variable is passed as the
second argument. order.go issues VarKill and Vardef
for this temporary variable.

So syntax x := <- c; c <- x will do it in three steps:

* pass address of temporary variable to chanrecv1 and call
* load value of the variable into register and mark death of
temporary variable
* mark creation of variable x and store value into x

The relevant SSA result are as following:

v18 (+5) = StaticCall <mem> {runtime.chanrecv1} [16] v17
v19 (5) = LocalAddr <*int> {.autotmp_2} v2 v18
v20 (5) = Load <int> v19 v18 (x[int])
v21 (4) = VarKill <mem> {.autotmp_2} v18
v22 (6) = VarDef <mem> {.autotmp_2} v21
v23 (6) = LocalAddr <*int> {.autotmp_2} v2 v22
v24 (6) = Store <mem> {int} v23 v20 v22

This commit finds this temporary variable and extends the lifetime
of this temporary variable.
The same address is killed by VarKill and recreated by
VarDef. The two operations are eliminated.

However, there is a special case to prevent this rule from being
applied. SSA values' with "typ void" as type can refer to
VarKill, which can lead to a result that VarKill is kept and
VarDef is removed.

The solution is to limit the Uses count on the VarDef and VarKill,
because their side-effect changes the liveness of variables.
VarDef must be referred twice: LocalAddr and Store in the same
pattern; VarKill must be only referred by VarKill because it is
a temporary variable.

To validate this fix, the following tests have been done:

* build go on Linux/amd64 (all.bash)
* build go on darwin64 (all.bash)
* build honnef.co/go/tools@v0.0.1-2019.2.3 on Linux/amd64
* build and pass tests for prometheus
* build and pass tests pingcap/tidb

A benchmark is performed on $GOROOT/test/bench/go1. The results are
compared between:

* old: go version devel +3f51350c70
* new: +3f51350c70 + this patch

name                      old time/op    new time/op    delta
BinaryTree17-12              1.96s ± 0%     1.96s ± 0%   ~     (p=1.000 n=1+1)
Fannkuch11-12                1.78s ± 0%     1.78s ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfEmpty-12          25.4ns ± 0%    25.5ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfString-12         44.3ns ± 0%    43.3ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfInt-12            48.6ns ± 0%    49.1ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfIntInt-12         77.0ns ± 0%    79.2ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfPrefixedInt-12    81.9ns ± 0%    83.1ns ± 0%   ~     (p=1.000 n=1+1)
FmtFprintfFloat-12           127ns ± 0%     127ns ± 0%   ~     (all equal)
FmtManyArgs-12               312ns ± 0%     310ns ± 0%   ~     (p=1.000 n=1+1)
GobDecode-12                3.27ms ± 0%    3.28ms ± 0%   ~     (p=1.000 n=1+1)
GobEncode-12                2.70ms ± 0%    2.66ms ± 0%   ~     (p=1.000 n=1+1)
Gzip-12                      137ms ± 0%     138ms ± 0%   ~     (p=1.000 n=1+1)
Gunzip-12                   21.4ms ± 0%    22.1ms ± 0%   ~     (p=1.000 n=1+1)
HTTPClientServer-12         40.4µs ± 0%    40.3µs ± 0%   ~     (p=1.000 n=1+1)
JSONEncode-12               5.53ms ± 0%    5.59ms ± 0%   ~     (p=1.000 n=1+1)
JSONDecode-12               24.7ms ± 0%    24.6ms ± 0%   ~     (p=1.000 n=1+1)
Mandelbrot200-12            2.84ms ± 0%    2.84ms ± 0%   ~     (p=1.000 n=1+1)
GoParse-12                  2.02ms ± 0%    1.99ms ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_32-12      40.6ns ± 0%    41.5ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_1K-12       108ns ± 0%     111ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_32-12      36.0ns ± 0%    36.5ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_1K-12       186ns ± 0%     185ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_32-12     3.56ns ± 0%    3.56ns ± 0%   ~     (all equal)
RegexpMatchMedium_1K-12     20.8µs ± 0%    20.5µs ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_32-12        984ns ± 0%     961ns ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_1K-12       28.8µs ± 0%    28.8µs ± 0%   ~     (p=1.000 n=1+1)
Revcomp-12                   276ms ± 0%     281ms ± 0%   ~     (p=1.000 n=1+1)
Template-12                 32.0ms ± 0%    31.4ms ± 0%   ~     (p=1.000 n=1+1)
TimeParse-12                 204ns ± 0%     202ns ± 0%   ~     (p=1.000 n=1+1)
TimeFormat-12                187ns ± 0%     188ns ± 0%   ~     (p=1.000 n=1+1)

name                      old speed      new speed      delta
GobDecode-12               235MB/s ± 0%   234MB/s ± 0%   ~     (p=1.000 n=1+1)
GobEncode-12               284MB/s ± 0%   288MB/s ± 0%   ~     (p=1.000 n=1+1)
Gzip-12                    142MB/s ± 0%   141MB/s ± 0%   ~     (p=1.000 n=1+1)
Gunzip-12                  906MB/s ± 0%   879MB/s ± 0%   ~     (p=1.000 n=1+1)
JSONEncode-12              351MB/s ± 0%   347MB/s ± 0%   ~     (p=1.000 n=1+1)
JSONDecode-12             78.6MB/s ± 0%  78.8MB/s ± 0%   ~     (p=1.000 n=1+1)
GoParse-12                28.7MB/s ± 0%  29.1MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_32-12     788MB/s ± 0%   771MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy0_1K-12    9.44GB/s ± 0%  9.26GB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_32-12     888MB/s ± 0%   877MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchEasy1_1K-12    5.50GB/s ± 0%  5.54GB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_32-12    281MB/s ± 0%   281MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchMedium_1K-12   49.3MB/s ± 0%  50.0MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_32-12     32.5MB/s ± 0%  33.3MB/s ± 0%   ~     (p=1.000 n=1+1)
RegexpMatchHard_1K-12     35.5MB/s ± 0%  35.6MB/s ± 0%   ~     (p=1.000 n=1+1)
Revcomp-12                 921MB/s ± 0%   905MB/s ± 0%   ~     (p=1.000 n=1+1)
Template-12               60.6MB/s ± 0%  61.8MB/s ± 0%   ~     (p=1.000 n=1+1)

Fixes golang#35969
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants