-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: unneeded MOV's caused by channel operations #35969
Comments
The syntax
The SSA result in
The
However, the |
I have a fix now. Will submit to CL soon. |
This commit optimizes extra MOVs, created by variable definition and channel receiving <- operation. The syntax <- chan will be translated to runtime call: func chanrecv1(c *hchan, elem unsafe.Pointer). Therefore, syntax <- c creates an addressable temporary variable, and the address of this temporary variable is passed as the second argument. order.go issues VarKill and Vardef for this temporary variable. So syntax x := <- c; c <- x will do it in three steps: * pass address of temporary variable to chanrecv1 and call * load value of the variable into register and mark death of temporary variable * mark creation of variable x and store value into x The relevant SSA result are as following: v18 (+5) = StaticCall <mem> {runtime.chanrecv1} [16] v17 v19 (5) = LocalAddr <*int> {.autotmp_2} v2 v18 v20 (5) = Load <int> v19 v18 (x[int]) v21 (4) = VarKill <mem> {.autotmp_2} v18 v22 (6) = VarDef <mem> {.autotmp_2} v21 v23 (6) = LocalAddr <*int> {.autotmp_2} v2 v22 v24 (6) = Store <mem> {int} v23 v20 v22 This commit finds this temporary variable and extends the lifetime of this temporary variable. The same address is killed by VarKill and recreated by VarDef. The two operations are eliminated. A benchmark is performed on $GOROOT/test/bench/go1. The results are compared between: * old: go version devel +3f51350c70 * new: +3f51350c70 + this patch name old time/op new time/op delta BinaryTree17-12 1.96s ± 0% 1.96s ± 0% ~ (p=1.000 n=1+1) Fannkuch11-12 1.78s ± 0% 1.78s ± 0% ~ (p=1.000 n=1+1) FmtFprintfEmpty-12 25.4ns ± 0% 25.5ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfString-12 44.3ns ± 0% 43.3ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfInt-12 48.6ns ± 0% 49.1ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfIntInt-12 77.0ns ± 0% 79.2ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfPrefixedInt-12 81.9ns ± 0% 83.1ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfFloat-12 127ns ± 0% 127ns ± 0% ~ (all equal) FmtManyArgs-12 312ns ± 0% 310ns ± 0% ~ (p=1.000 n=1+1) GobDecode-12 3.27ms ± 0% 3.28ms ± 0% ~ (p=1.000 n=1+1) GobEncode-12 2.70ms ± 0% 2.66ms ± 0% ~ (p=1.000 n=1+1) Gzip-12 137ms ± 0% 138ms ± 0% ~ (p=1.000 n=1+1) Gunzip-12 21.4ms ± 0% 22.1ms ± 0% ~ (p=1.000 n=1+1) HTTPClientServer-12 40.4µs ± 0% 40.3µs ± 0% ~ (p=1.000 n=1+1) JSONEncode-12 5.53ms ± 0% 5.59ms ± 0% ~ (p=1.000 n=1+1) JSONDecode-12 24.7ms ± 0% 24.6ms ± 0% ~ (p=1.000 n=1+1) Mandelbrot200-12 2.84ms ± 0% 2.84ms ± 0% ~ (p=1.000 n=1+1) GoParse-12 2.02ms ± 0% 1.99ms ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_32-12 40.6ns ± 0% 41.5ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_1K-12 108ns ± 0% 111ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_32-12 36.0ns ± 0% 36.5ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_1K-12 186ns ± 0% 185ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_32-12 3.56ns ± 0% 3.56ns ± 0% ~ (all equal) RegexpMatchMedium_1K-12 20.8µs ± 0% 20.5µs ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_32-12 984ns ± 0% 961ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_1K-12 28.8µs ± 0% 28.8µs ± 0% ~ (p=1.000 n=1+1) Revcomp-12 276ms ± 0% 281ms ± 0% ~ (p=1.000 n=1+1) Template-12 32.0ms ± 0% 31.4ms ± 0% ~ (p=1.000 n=1+1) TimeParse-12 204ns ± 0% 202ns ± 0% ~ (p=1.000 n=1+1) TimeFormat-12 187ns ± 0% 188ns ± 0% ~ (p=1.000 n=1+1) name old speed new speed delta GobDecode-12 235MB/s ± 0% 234MB/s ± 0% ~ (p=1.000 n=1+1) GobEncode-12 284MB/s ± 0% 288MB/s ± 0% ~ (p=1.000 n=1+1) Gzip-12 142MB/s ± 0% 141MB/s ± 0% ~ (p=1.000 n=1+1) Gunzip-12 906MB/s ± 0% 879MB/s ± 0% ~ (p=1.000 n=1+1) JSONEncode-12 351MB/s ± 0% 347MB/s ± 0% ~ (p=1.000 n=1+1) JSONDecode-12 78.6MB/s ± 0% 78.8MB/s ± 0% ~ (p=1.000 n=1+1) GoParse-12 28.7MB/s ± 0% 29.1MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_32-12 788MB/s ± 0% 771MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_1K-12 9.44GB/s ± 0% 9.26GB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_32-12 888MB/s ± 0% 877MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_1K-12 5.50GB/s ± 0% 5.54GB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_32-12 281MB/s ± 0% 281MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_1K-12 49.3MB/s ± 0% 50.0MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_32-12 32.5MB/s ± 0% 33.3MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_1K-12 35.5MB/s ± 0% 35.6MB/s ± 0% ~ (p=1.000 n=1+1) Revcomp-12 921MB/s ± 0% 905MB/s ± 0% ~ (p=1.000 n=1+1) Template-12 60.6MB/s ± 0% 61.8MB/s ± 0% ~ (p=1.000 n=1+1) Fixes golang#35969
Change https://golang.org/cl/212303 mentions this issue: |
This commit optimizes extra MOVs, created by variable definition and channel receiving <- operation. The syntax <- chan will be translated to runtime call: func chanrecv1(c *hchan, elem unsafe.Pointer). Therefore, syntax <- c creates an addressable temporary variable, and the address of this temporary variable is passed as the second argument. order.go issues VarKill and Vardef for this temporary variable. So syntax x := <- c; c <- x will do it in three steps: * pass address of temporary variable to chanrecv1 and call * load value of the variable into register and mark death of temporary variable * mark creation of variable x and store value into x The relevant SSA result are as following: v18 (+5) = StaticCall <mem> {runtime.chanrecv1} [16] v17 v19 (5) = LocalAddr <*int> {.autotmp_2} v2 v18 v20 (5) = Load <int> v19 v18 (x[int]) v21 (4) = VarKill <mem> {.autotmp_2} v18 v22 (6) = VarDef <mem> {.autotmp_2} v21 v23 (6) = LocalAddr <*int> {.autotmp_2} v2 v22 v24 (6) = Store <mem> {int} v23 v20 v22 This commit finds this temporary variable and extends the lifetime of this temporary variable. The same address is killed by VarKill and recreated by VarDef. The two operations are eliminated. A benchmark is performed on $GOROOT/test/bench/go1. The results are compared between: * old: go version devel +3f51350c70 * new: +3f51350c70 + this patch name old time/op new time/op delta BinaryTree17-12 1.96s ± 0% 1.96s ± 0% ~ (p=1.000 n=1+1) Fannkuch11-12 1.78s ± 0% 1.78s ± 0% ~ (p=1.000 n=1+1) FmtFprintfEmpty-12 25.4ns ± 0% 25.5ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfString-12 44.3ns ± 0% 43.3ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfInt-12 48.6ns ± 0% 49.1ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfIntInt-12 77.0ns ± 0% 79.2ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfPrefixedInt-12 81.9ns ± 0% 83.1ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfFloat-12 127ns ± 0% 127ns ± 0% ~ (all equal) FmtManyArgs-12 312ns ± 0% 310ns ± 0% ~ (p=1.000 n=1+1) GobDecode-12 3.27ms ± 0% 3.28ms ± 0% ~ (p=1.000 n=1+1) GobEncode-12 2.70ms ± 0% 2.66ms ± 0% ~ (p=1.000 n=1+1) Gzip-12 137ms ± 0% 138ms ± 0% ~ (p=1.000 n=1+1) Gunzip-12 21.4ms ± 0% 22.1ms ± 0% ~ (p=1.000 n=1+1) HTTPClientServer-12 40.4µs ± 0% 40.3µs ± 0% ~ (p=1.000 n=1+1) JSONEncode-12 5.53ms ± 0% 5.59ms ± 0% ~ (p=1.000 n=1+1) JSONDecode-12 24.7ms ± 0% 24.6ms ± 0% ~ (p=1.000 n=1+1) Mandelbrot200-12 2.84ms ± 0% 2.84ms ± 0% ~ (p=1.000 n=1+1) GoParse-12 2.02ms ± 0% 1.99ms ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_32-12 40.6ns ± 0% 41.5ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_1K-12 108ns ± 0% 111ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_32-12 36.0ns ± 0% 36.5ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_1K-12 186ns ± 0% 185ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_32-12 3.56ns ± 0% 3.56ns ± 0% ~ (all equal) RegexpMatchMedium_1K-12 20.8µs ± 0% 20.5µs ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_32-12 984ns ± 0% 961ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_1K-12 28.8µs ± 0% 28.8µs ± 0% ~ (p=1.000 n=1+1) Revcomp-12 276ms ± 0% 281ms ± 0% ~ (p=1.000 n=1+1) Template-12 32.0ms ± 0% 31.4ms ± 0% ~ (p=1.000 n=1+1) TimeParse-12 204ns ± 0% 202ns ± 0% ~ (p=1.000 n=1+1) TimeFormat-12 187ns ± 0% 188ns ± 0% ~ (p=1.000 n=1+1) name old speed new speed delta GobDecode-12 235MB/s ± 0% 234MB/s ± 0% ~ (p=1.000 n=1+1) GobEncode-12 284MB/s ± 0% 288MB/s ± 0% ~ (p=1.000 n=1+1) Gzip-12 142MB/s ± 0% 141MB/s ± 0% ~ (p=1.000 n=1+1) Gunzip-12 906MB/s ± 0% 879MB/s ± 0% ~ (p=1.000 n=1+1) JSONEncode-12 351MB/s ± 0% 347MB/s ± 0% ~ (p=1.000 n=1+1) JSONDecode-12 78.6MB/s ± 0% 78.8MB/s ± 0% ~ (p=1.000 n=1+1) GoParse-12 28.7MB/s ± 0% 29.1MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_32-12 788MB/s ± 0% 771MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_1K-12 9.44GB/s ± 0% 9.26GB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_32-12 888MB/s ± 0% 877MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_1K-12 5.50GB/s ± 0% 5.54GB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_32-12 281MB/s ± 0% 281MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_1K-12 49.3MB/s ± 0% 50.0MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_32-12 32.5MB/s ± 0% 33.3MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_1K-12 35.5MB/s ± 0% 35.6MB/s ± 0% ~ (p=1.000 n=1+1) Revcomp-12 921MB/s ± 0% 905MB/s ± 0% ~ (p=1.000 n=1+1) Template-12 60.6MB/s ± 0% 61.8MB/s ± 0% ~ (p=1.000 n=1+1) Fixes golang#35969
This commit optimizes extra MOVs, created by variable definition and channel receiving <- operation. The syntax <- chan will be translated to runtime call: func chanrecv1(c *hchan, elem unsafe.Pointer). Therefore, syntax <- c creates an addressable temporary variable, and the address of this temporary variable is passed as the second argument. order.go issues VarKill and Vardef for this temporary variable. So syntax x := <- c; c <- x will do it in three steps: * pass address of temporary variable to chanrecv1 and call * load value of the variable into register and mark death of temporary variable * mark creation of variable x and store value into x The relevant SSA result are as following: v18 (+5) = StaticCall <mem> {runtime.chanrecv1} [16] v17 v19 (5) = LocalAddr <*int> {.autotmp_2} v2 v18 v20 (5) = Load <int> v19 v18 (x[int]) v21 (4) = VarKill <mem> {.autotmp_2} v18 v22 (6) = VarDef <mem> {.autotmp_2} v21 v23 (6) = LocalAddr <*int> {.autotmp_2} v2 v22 v24 (6) = Store <mem> {int} v23 v20 v22 This commit finds this temporary variable and extends the lifetime of this temporary variable. The same address is killed by VarKill and recreated by VarDef. The two operations are eliminated. InlMark is a special case: it won't return a dummy mem, which can be used by next SSA. This can cause a failure in the following case during pass ssa/schedule.go: v185 (864) = InterCall <mem> [32] v180 v184 v187 (864) = Load <Range> v186 v185 (r[Range]) v188 (864) = LocalAddr <*ssa.Value> {.autotmp_18} v2 v185 v189 (864) = Load <ssa.Value> v188 v185 (x[ssa.Value]) v190 (863) = VarKill <mem> {.autotmp_18} v185 v191 (864) = InlMark <void> [414] v190 v192 (708) = VarDef <mem> {.autotmp_18} v190 v193 (708) = LocalAddr <*ssa.Value> {.autotmp_18} v2 v192 v194 (708) = Store <mem> {ssa.Value} v193 v189 v192 The InlMark (v191) keeps v190 alive and therefore v190 (VarKill) is kept, while corresponding Vardef is eliminated. The dummy mem() is taken squentially, thus let InlMark use the mem() before VarKill. A regex search ^\s+s\.newValue is used to find those cases that returned value of s.newValue*() is not utilized. The only two matches are InlMark and NilCheck. However, NilCheck cannot work with this fix (unknown reason, error in building go toolchain). To validate this fix, the following tests have been done: * build go on Linux/amd64 (all.bash) * build go on darwin64 (all.bash) * build honnef.co/go/tools@v0.0.1-2019.2.3 on Linux/amd64 * build and pass tests for prometheus * build and pass tests pingcap/tidb A benchmark is performed on $GOROOT/test/bench/go1. The results are compared between: * old: go version devel +3f51350c70 * new: +3f51350c70 + this patch name old time/op new time/op delta BinaryTree17-12 1.96s ± 0% 1.96s ± 0% ~ (p=1.000 n=1+1) Fannkuch11-12 1.78s ± 0% 1.78s ± 0% ~ (p=1.000 n=1+1) FmtFprintfEmpty-12 25.4ns ± 0% 25.5ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfString-12 44.3ns ± 0% 43.3ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfInt-12 48.6ns ± 0% 49.1ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfIntInt-12 77.0ns ± 0% 79.2ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfPrefixedInt-12 81.9ns ± 0% 83.1ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfFloat-12 127ns ± 0% 127ns ± 0% ~ (all equal) FmtManyArgs-12 312ns ± 0% 310ns ± 0% ~ (p=1.000 n=1+1) GobDecode-12 3.27ms ± 0% 3.28ms ± 0% ~ (p=1.000 n=1+1) GobEncode-12 2.70ms ± 0% 2.66ms ± 0% ~ (p=1.000 n=1+1) Gzip-12 137ms ± 0% 138ms ± 0% ~ (p=1.000 n=1+1) Gunzip-12 21.4ms ± 0% 22.1ms ± 0% ~ (p=1.000 n=1+1) HTTPClientServer-12 40.4µs ± 0% 40.3µs ± 0% ~ (p=1.000 n=1+1) JSONEncode-12 5.53ms ± 0% 5.59ms ± 0% ~ (p=1.000 n=1+1) JSONDecode-12 24.7ms ± 0% 24.6ms ± 0% ~ (p=1.000 n=1+1) Mandelbrot200-12 2.84ms ± 0% 2.84ms ± 0% ~ (p=1.000 n=1+1) GoParse-12 2.02ms ± 0% 1.99ms ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_32-12 40.6ns ± 0% 41.5ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_1K-12 108ns ± 0% 111ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_32-12 36.0ns ± 0% 36.5ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_1K-12 186ns ± 0% 185ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_32-12 3.56ns ± 0% 3.56ns ± 0% ~ (all equal) RegexpMatchMedium_1K-12 20.8µs ± 0% 20.5µs ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_32-12 984ns ± 0% 961ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_1K-12 28.8µs ± 0% 28.8µs ± 0% ~ (p=1.000 n=1+1) Revcomp-12 276ms ± 0% 281ms ± 0% ~ (p=1.000 n=1+1) Template-12 32.0ms ± 0% 31.4ms ± 0% ~ (p=1.000 n=1+1) TimeParse-12 204ns ± 0% 202ns ± 0% ~ (p=1.000 n=1+1) TimeFormat-12 187ns ± 0% 188ns ± 0% ~ (p=1.000 n=1+1) name old speed new speed delta GobDecode-12 235MB/s ± 0% 234MB/s ± 0% ~ (p=1.000 n=1+1) GobEncode-12 284MB/s ± 0% 288MB/s ± 0% ~ (p=1.000 n=1+1) Gzip-12 142MB/s ± 0% 141MB/s ± 0% ~ (p=1.000 n=1+1) Gunzip-12 906MB/s ± 0% 879MB/s ± 0% ~ (p=1.000 n=1+1) JSONEncode-12 351MB/s ± 0% 347MB/s ± 0% ~ (p=1.000 n=1+1) JSONDecode-12 78.6MB/s ± 0% 78.8MB/s ± 0% ~ (p=1.000 n=1+1) GoParse-12 28.7MB/s ± 0% 29.1MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_32-12 788MB/s ± 0% 771MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_1K-12 9.44GB/s ± 0% 9.26GB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_32-12 888MB/s ± 0% 877MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_1K-12 5.50GB/s ± 0% 5.54GB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_32-12 281MB/s ± 0% 281MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_1K-12 49.3MB/s ± 0% 50.0MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_32-12 32.5MB/s ± 0% 33.3MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_1K-12 35.5MB/s ± 0% 35.6MB/s ± 0% ~ (p=1.000 n=1+1) Revcomp-12 921MB/s ± 0% 905MB/s ± 0% ~ (p=1.000 n=1+1) Template-12 60.6MB/s ± 0% 61.8MB/s ± 0% ~ (p=1.000 n=1+1) Fixes golang#35969
This commit optimizes extra MOVs, created by variable definition and channel receiving <- operation. The syntax <- chan will be translated to runtime call: func chanrecv1(c *hchan, elem unsafe.Pointer). Therefore, syntax <- c creates an addressable temporary variable, and the address of this temporary variable is passed as the second argument. order.go issues VarKill and Vardef for this temporary variable. So syntax x := <- c; c <- x will do it in three steps: * pass address of temporary variable to chanrecv1 and call * load value of the variable into register and mark death of temporary variable * mark creation of variable x and store value into x The relevant SSA result are as following: v18 (+5) = StaticCall <mem> {runtime.chanrecv1} [16] v17 v19 (5) = LocalAddr <*int> {.autotmp_2} v2 v18 v20 (5) = Load <int> v19 v18 (x[int]) v21 (4) = VarKill <mem> {.autotmp_2} v18 v22 (6) = VarDef <mem> {.autotmp_2} v21 v23 (6) = LocalAddr <*int> {.autotmp_2} v2 v22 v24 (6) = Store <mem> {int} v23 v20 v22 This commit finds this temporary variable and extends the lifetime of this temporary variable. The same address is killed by VarKill and recreated by VarDef. The two operations are eliminated. However, there is a special case to prevent this rule from being applied. SSA values' with "typ void" as type can refer to VarKill, which can lead to a result that VarKill is kept and VarDef is removed. The solution is to limit the Uses count on the VarDef and VarKill, because their side-effect changes the liveness of variables. VarDef must be referred twice: LocalAddr and Store in the same pattern; VarKill must be only referred by VarKill because it is a temporary variable. To validate this fix, the following tests have been done: * build go on Linux/amd64 (all.bash) * build go on darwin64 (all.bash) * build go on windows/amd64 (all.bat, CGO_ENABLED=0) * build honnef.co/go/tools@v0.0.1-2019.2.3 on Linux/amd64 * build and pass tests for prometheus * build and pass tests pingcap/tidb A benchmark is performed on $GOROOT/test/bench/go1. The results are compared between: * old: go version devel +3f51350c70 * new: +3f51350c70 + this patch name old time/op new time/op delta BinaryTree17-12 1.96s ± 0% 1.96s ± 0% ~ (p=1.000 n=1+1) Fannkuch11-12 1.78s ± 0% 1.78s ± 0% ~ (p=1.000 n=1+1) FmtFprintfEmpty-12 25.4ns ± 0% 25.5ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfString-12 44.3ns ± 0% 43.3ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfInt-12 48.6ns ± 0% 49.1ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfIntInt-12 77.0ns ± 0% 79.2ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfPrefixedInt-12 81.9ns ± 0% 83.1ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfFloat-12 127ns ± 0% 127ns ± 0% ~ (all equal) FmtManyArgs-12 312ns ± 0% 310ns ± 0% ~ (p=1.000 n=1+1) GobDecode-12 3.27ms ± 0% 3.28ms ± 0% ~ (p=1.000 n=1+1) GobEncode-12 2.70ms ± 0% 2.66ms ± 0% ~ (p=1.000 n=1+1) Gzip-12 137ms ± 0% 138ms ± 0% ~ (p=1.000 n=1+1) Gunzip-12 21.4ms ± 0% 22.1ms ± 0% ~ (p=1.000 n=1+1) HTTPClientServer-12 40.4µs ± 0% 40.3µs ± 0% ~ (p=1.000 n=1+1) JSONEncode-12 5.53ms ± 0% 5.59ms ± 0% ~ (p=1.000 n=1+1) JSONDecode-12 24.7ms ± 0% 24.6ms ± 0% ~ (p=1.000 n=1+1) Mandelbrot200-12 2.84ms ± 0% 2.84ms ± 0% ~ (p=1.000 n=1+1) GoParse-12 2.02ms ± 0% 1.99ms ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_32-12 40.6ns ± 0% 41.5ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_1K-12 108ns ± 0% 111ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_32-12 36.0ns ± 0% 36.5ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_1K-12 186ns ± 0% 185ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_32-12 3.56ns ± 0% 3.56ns ± 0% ~ (all equal) RegexpMatchMedium_1K-12 20.8µs ± 0% 20.5µs ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_32-12 984ns ± 0% 961ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_1K-12 28.8µs ± 0% 28.8µs ± 0% ~ (p=1.000 n=1+1) Revcomp-12 276ms ± 0% 281ms ± 0% ~ (p=1.000 n=1+1) Template-12 32.0ms ± 0% 31.4ms ± 0% ~ (p=1.000 n=1+1) TimeParse-12 204ns ± 0% 202ns ± 0% ~ (p=1.000 n=1+1) TimeFormat-12 187ns ± 0% 188ns ± 0% ~ (p=1.000 n=1+1) name old speed new speed delta GobDecode-12 235MB/s ± 0% 234MB/s ± 0% ~ (p=1.000 n=1+1) GobEncode-12 284MB/s ± 0% 288MB/s ± 0% ~ (p=1.000 n=1+1) Gzip-12 142MB/s ± 0% 141MB/s ± 0% ~ (p=1.000 n=1+1) Gunzip-12 906MB/s ± 0% 879MB/s ± 0% ~ (p=1.000 n=1+1) JSONEncode-12 351MB/s ± 0% 347MB/s ± 0% ~ (p=1.000 n=1+1) JSONDecode-12 78.6MB/s ± 0% 78.8MB/s ± 0% ~ (p=1.000 n=1+1) GoParse-12 28.7MB/s ± 0% 29.1MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_32-12 788MB/s ± 0% 771MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_1K-12 9.44GB/s ± 0% 9.26GB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_32-12 888MB/s ± 0% 877MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_1K-12 5.50GB/s ± 0% 5.54GB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_32-12 281MB/s ± 0% 281MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_1K-12 49.3MB/s ± 0% 50.0MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_32-12 32.5MB/s ± 0% 33.3MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_1K-12 35.5MB/s ± 0% 35.6MB/s ± 0% ~ (p=1.000 n=1+1) Revcomp-12 921MB/s ± 0% 905MB/s ± 0% ~ (p=1.000 n=1+1) Template-12 60.6MB/s ± 0% 61.8MB/s ± 0% ~ (p=1.000 n=1+1) Fixes golang#35969
This commit optimizes extra MOVs, created by variable definition and channel receiving <- operation. The syntax <- chan will be translated to runtime call: func chanrecv1(c *hchan, elem unsafe.Pointer). Therefore, syntax <- c creates an addressable temporary variable, and the address of this temporary variable is passed as the second argument. order.go issues VarKill and Vardef for this temporary variable. So syntax x := <- c; c <- x will do it in three steps: * pass address of temporary variable to chanrecv1 and call * load value of the variable into register and mark death of temporary variable * mark creation of variable x and store value into x The relevant SSA result are as following: v18 (+5) = StaticCall <mem> {runtime.chanrecv1} [16] v17 v19 (5) = LocalAddr <*int> {.autotmp_2} v2 v18 v20 (5) = Load <int> v19 v18 (x[int]) v21 (4) = VarKill <mem> {.autotmp_2} v18 v22 (6) = VarDef <mem> {.autotmp_2} v21 v23 (6) = LocalAddr <*int> {.autotmp_2} v2 v22 v24 (6) = Store <mem> {int} v23 v20 v22 This commit finds this temporary variable and extends the lifetime of this temporary variable. The same address is killed by VarKill and recreated by VarDef. The two operations are eliminated. However, there is a special case to prevent this rule from being applied. SSA values' with "typ void" as type can refer to VarKill, which can lead to a result that VarKill is kept and VarDef is removed. The solution is to limit the Uses count on the VarDef and VarKill, because their side-effect changes the liveness of variables. VarDef must be referred twice: LocalAddr and Store in the same pattern; VarKill must be only referred by VarKill because it is a temporary variable. To validate this fix, the following tests have been done: * build go on Linux/amd64 (all.bash) * build go on darwin64 (all.bash) * build honnef.co/go/tools@v0.0.1-2019.2.3 on Linux/amd64 * build and pass tests for prometheus * build and pass tests pingcap/tidb A benchmark is performed on $GOROOT/test/bench/go1. The results are compared between: * old: go version devel +3f51350c70 * new: +3f51350c70 + this patch name old time/op new time/op delta BinaryTree17-12 1.96s ± 0% 1.96s ± 0% ~ (p=1.000 n=1+1) Fannkuch11-12 1.78s ± 0% 1.78s ± 0% ~ (p=1.000 n=1+1) FmtFprintfEmpty-12 25.4ns ± 0% 25.5ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfString-12 44.3ns ± 0% 43.3ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfInt-12 48.6ns ± 0% 49.1ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfIntInt-12 77.0ns ± 0% 79.2ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfPrefixedInt-12 81.9ns ± 0% 83.1ns ± 0% ~ (p=1.000 n=1+1) FmtFprintfFloat-12 127ns ± 0% 127ns ± 0% ~ (all equal) FmtManyArgs-12 312ns ± 0% 310ns ± 0% ~ (p=1.000 n=1+1) GobDecode-12 3.27ms ± 0% 3.28ms ± 0% ~ (p=1.000 n=1+1) GobEncode-12 2.70ms ± 0% 2.66ms ± 0% ~ (p=1.000 n=1+1) Gzip-12 137ms ± 0% 138ms ± 0% ~ (p=1.000 n=1+1) Gunzip-12 21.4ms ± 0% 22.1ms ± 0% ~ (p=1.000 n=1+1) HTTPClientServer-12 40.4µs ± 0% 40.3µs ± 0% ~ (p=1.000 n=1+1) JSONEncode-12 5.53ms ± 0% 5.59ms ± 0% ~ (p=1.000 n=1+1) JSONDecode-12 24.7ms ± 0% 24.6ms ± 0% ~ (p=1.000 n=1+1) Mandelbrot200-12 2.84ms ± 0% 2.84ms ± 0% ~ (p=1.000 n=1+1) GoParse-12 2.02ms ± 0% 1.99ms ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_32-12 40.6ns ± 0% 41.5ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_1K-12 108ns ± 0% 111ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_32-12 36.0ns ± 0% 36.5ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_1K-12 186ns ± 0% 185ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_32-12 3.56ns ± 0% 3.56ns ± 0% ~ (all equal) RegexpMatchMedium_1K-12 20.8µs ± 0% 20.5µs ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_32-12 984ns ± 0% 961ns ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_1K-12 28.8µs ± 0% 28.8µs ± 0% ~ (p=1.000 n=1+1) Revcomp-12 276ms ± 0% 281ms ± 0% ~ (p=1.000 n=1+1) Template-12 32.0ms ± 0% 31.4ms ± 0% ~ (p=1.000 n=1+1) TimeParse-12 204ns ± 0% 202ns ± 0% ~ (p=1.000 n=1+1) TimeFormat-12 187ns ± 0% 188ns ± 0% ~ (p=1.000 n=1+1) name old speed new speed delta GobDecode-12 235MB/s ± 0% 234MB/s ± 0% ~ (p=1.000 n=1+1) GobEncode-12 284MB/s ± 0% 288MB/s ± 0% ~ (p=1.000 n=1+1) Gzip-12 142MB/s ± 0% 141MB/s ± 0% ~ (p=1.000 n=1+1) Gunzip-12 906MB/s ± 0% 879MB/s ± 0% ~ (p=1.000 n=1+1) JSONEncode-12 351MB/s ± 0% 347MB/s ± 0% ~ (p=1.000 n=1+1) JSONDecode-12 78.6MB/s ± 0% 78.8MB/s ± 0% ~ (p=1.000 n=1+1) GoParse-12 28.7MB/s ± 0% 29.1MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_32-12 788MB/s ± 0% 771MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy0_1K-12 9.44GB/s ± 0% 9.26GB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_32-12 888MB/s ± 0% 877MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchEasy1_1K-12 5.50GB/s ± 0% 5.54GB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_32-12 281MB/s ± 0% 281MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchMedium_1K-12 49.3MB/s ± 0% 50.0MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_32-12 32.5MB/s ± 0% 33.3MB/s ± 0% ~ (p=1.000 n=1+1) RegexpMatchHard_1K-12 35.5MB/s ± 0% 35.6MB/s ± 0% ~ (p=1.000 n=1+1) Revcomp-12 921MB/s ± 0% 905MB/s ± 0% ~ (p=1.000 n=1+1) Template-12 60.6MB/s ± 0% 61.8MB/s ± 0% ~ (p=1.000 n=1+1) Fixes golang#35969
No longer reproduces. Fixed in Go 1.22 |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What did you do?
I compiled the following function:
What did you expect to see?
I expected that the generated code would not contain unnecessary instructions.
What did you see instead?
Instead, there are a couple of move instructions that don't accomplish anything (lines 25-26 in https://godbolt.org/z/RnTRWz).
The text was updated successfully, but these errors were encountered: