cmd/compile: SSA performance regression due to array zeroing #15925

ghost · 2016-06-01T22:33:50Z

go version devel +bbd1dcd Wed Jun 1 21:32:46 2016 +0000 darwin/amd64

https://github.com/hydroflame/go1.7bench
This very simple program take 2 4 element vector and performs component wise addition.
If you run the benchmark with SSA and without you will see that it is much faster without SSA.

$ go test -run NONE -bench . -gcflags -ssa=0 > nossa.txt
testing: warning: no tests to run
$ go test -run NONE -bench . -gcflags -ssa=1 > ssa.txt
testing: warning: no tests to run
$ benchcmp nossa.txt ssa.txt 
benchmark          old ns/op     new ns/op     delta
BenchmarkAdd-4     2.45          11.4          +365.31%

I expect that the binary generated with SSA would be at least the same speed as the old one.

This particular example is extracted from one of my bigger library: github.com/luxengine/glm
for more example of SSA bad performance run benchmarks in

github.com/luxengine/glm
github.com/luxengine/glm/geo

The text was updated successfully, but these errors were encountered:

cespare · 2016-06-01T22:36:41Z

/cc @randall77

randall77 · 2016-06-01T23:00:24Z

The SSA backend doesn't handle arrays terribly well, as they are not SSAable. This leads to some extra copies that the old backend was able to avoid.

add is inlined in this test. The inner loop of the BenchmarkAdd is (with ssa):

    0x0047 00071 (vec_test.go:12)   MOVQ    $0, "".~r1(SP)
    0x004f 00079 (vec_test.go:12)   MOVQ    $0, "".~r1+8(SP)
    0x0058 00088 (vec_test.go:12)   MOVQ    $0, "".autotmp_5+48(SP)
    0x0061 00097 (vec_test.go:12)   MOVQ    $0, "".autotmp_5+56(SP)
    0x006a 00106 (vec_test.go:12)   MOVSS   "".v0+32(SP), X0
    0x0070 00112 (vec_test.go:12)   MOVSS   "".v1+16(SP), X1
    0x0076 00118 (vec_test.go:12)   ADDSS   X1, X0
    0x007a 00122 (vec_test.go:12)   MOVSS   X0, "".autotmp_5+48(SP)
    0x0080 00128 (vec_test.go:12)   MOVSS   "".v0+36(SP), X0
    0x0086 00134 (vec_test.go:12)   MOVSS   "".v1+20(SP), X1
    0x008c 00140 (vec_test.go:12)   ADDSS   X1, X0
    0x0090 00144 (vec_test.go:12)   MOVSS   X0, "".autotmp_5+52(SP)
    0x0096 00150 (vec_test.go:12)   MOVSS   "".v0+40(SP), X0
    0x009c 00156 (vec_test.go:12)   MOVSS   "".v1+24(SP), X1
    0x00a2 00162 (vec_test.go:12)   ADDSS   X1, X0
    0x00a6 00166 (vec_test.go:12)   MOVSS   X0, "".autotmp_5+56(SP)
    0x00ac 00172 (vec_test.go:12)   MOVSS   "".v0+44(SP), X0
    0x00b2 00178 (vec_test.go:12)   MOVSS   "".v1+28(SP), X1
    0x00b8 00184 (vec_test.go:12)   ADDSS   X1, X0
    0x00bc 00188 (vec_test.go:12)   MOVSS   X0, "".autotmp_5+60(SP)
    0x00c2 00194 (vec_test.go:12)   MOVUPS  "".autotmp_5+48(SP), X0
    0x00c7 00199 (vec_test.go:12)   MOVUPS  X0, "".~r1(SP)
    0x00cb 00203 (vec_test.go:12)   MOVUPS  "".~r1(SP), X0
    0x00cf 00207 (vec_test.go:12)   MOVUPS  X0, "".tmp(SB)
    0x00d6 00214 (vec_test.go:11)   INCQ    CX
    0x00d9 00217 (vec_test.go:11)   MOVQ    184(AX), DX
    0x00e0 00224 (vec_test.go:11)   CMPQ    CX, DX
    0x00e3 00227 (vec_test.go:11)   JLT $0, 71

Note the unnecessary zeroings of ~r1 and autotmp_5, and the copy autotmp5 -> ~r1 -> tmp.

    0x0090 00144 (vec_test.go:12)   LEAQ    "".v0+16(SP), BX
    0x0095 00149 (vec_test.go:12)   MOVQ    BX, CX
    0x0098 00152 (vec_test.go:12)   LEAQ    "".v1(SP), BX
    0x009c 00156 (vec_test.go:12)   MOVQ    BX, AX
    0x009f 00159 (vec_test.go:12)   MOVQ    $0, BX
    0x00a1 00161 (vec_test.go:12)   XORPS   X0, X0
    0x00a4 00164 (vec_test.go:12)   MOVQ    $0, BX
    0x00a6 00166 (vec_test.go:12)   MOVSS   (CX), X4
    0x00aa 00170 (vec_test.go:12)   MOVSS   (AX), X1
    0x00ae 00174 (vec_test.go:12)   ADDSS   X1, X4
    0x00b2 00178 (vec_test.go:12)   MOVSS   4(CX), X3
    0x00b7 00183 (vec_test.go:12)   MOVSS   4(AX), X1
    0x00bc 00188 (vec_test.go:12)   ADDSS   X1, X3
    0x00c0 00192 (vec_test.go:12)   MOVSS   8(CX), X2
    0x00c5 00197 (vec_test.go:12)   MOVSS   8(AX), X1
    0x00ca 00202 (vec_test.go:12)   ADDSS   X1, X2
    0x00ce 00206 (vec_test.go:12)   MOVSS   12(CX), X0
    0x00d3 00211 (vec_test.go:12)   MOVSS   12(AX), X1
    0x00d8 00216 (vec_test.go:12)   ADDSS   X1, X0
    0x00dc 00220 (vec_test.go:12)   MOVSS   X4, "".tmp(SB)
    0x00e4 00228 (vec_test.go:12)   MOVSS   X3, "".tmp+4(SB)
    0x00ec 00236 (vec_test.go:12)   MOVSS   X2, "".tmp+8(SB)
    0x00f4 00244 (vec_test.go:12)   MOVSS   X0, "".tmp+12(SB)
    0x00fc 00252 (vec_test.go:11)   INCQ    DX
    0x00ff 00255 (vec_test.go:11)   MOVQ    184(SI), BX
    0x0106 00262 (vec_test.go:11)   CMPQ    BX, DX
    0x0109 00265 (vec_test.go:11)   JGT $0, 144

The old compiler does the direct write to the destination.

This is on my radar to fix, but it isn't easy. Hopefully for 1.8.
See also #14762, it applies to the uninlined version of add.

ghost · 2016-06-01T23:18:57Z

If I changed for vec4{x,y,z,w float32} would that solve the problem ?

randall77 · 2016-06-01T23:26:26Z

It might. Another workaround is to not pass around vectors by value. Use []float32 or *[4]float32 everywhere. (It is a bit odd that your add takes *vec4 as args but returns a vec4. It might be better anyway to do func (r *vec4) add(v0, v1 *vec4))

ghost · 2016-06-02T00:33:40Z

I tested it, it's definitely faster with structs

rsc · 2016-10-17T17:35:23Z

Why are structs and small arrays treated differently? Is blasting apart small arrays (the same way small structs are blasted apart) an easy optimization to add? I believe the old back ends did this for arrays of size <= 4 or something like that.

/cc @dr2chase @randall77 @cherrymui

cherrymui · 2016-10-17T17:46:25Z

I believe it does not decompose arrays because it doesn't know how to handle non-constant indexing. There is a TODO in gc/ssa.go:

    case TARRAY:
        // We can't do arrays because dynamic indexing is
        // not supported on SSA variables.
        // TODO: maybe allow if length is <=1?  All indexes
        // are constant?  Might be good for the arrays
        // introduced by the compiler for variadic functions.

rsc · 2016-10-21T03:13:58Z

Interesting. So the old back end, because it just worked on stack words, saw that these words were being used, never had their address taken, and completely registerized them. And SSA isn't smart enough to do that because it decides about variable tracking early? The implication here, though, is that all the array refs are constant indexes. Maybe if you know the array is only ever accessed by constant index (and address not taken, but that's needed for structs already) then it can be split apart?

randall77 · 2016-10-21T16:42:10Z

Yes, if we knew that all accesses were constant indexes we could registerize it.
Or if it is [1] we can registerize it, because the only index that can get past the bounds check is 0.

egonelbre · 2017-04-29T14:59:55Z

I was measuring different approaches to implementing Vector.Add. Almost made a new issue, but eventually found this.

One specific case stood out: https://play.golang.org/p/vzC344lntX. There is a 20x perfomance difference.

type Struct struct{ X, Y, Z float32 }
func (a Struct) Add(b Struct) Struct { return Struct{a.X + b.X, a.Y + b.Y, a.Z + b.Z} }

type Array [3]float32
func (a Array) Add(b Array) Array { return Array{a[0] + b[0], a[1] + b[1], a[2] + b[2]} }

BenchmarkStructAdd-8               2000000000               0.98 ns/op
BenchmarkArrayAdd-8                100000000               20.7 ns/op

Exhaustive benchmark of different approaches: https://play.golang.org/p/6diXIPOWu5

        Struct         Array
PP      2.91 ns/op     3.00 ns/op
PN      2.93 ns/op     2.94 ns/op
PPP     4.89 ns/op     9.41 ns/op
NNN     0.98 ns/op    20.70 ns/op
PPN     2.91 ns/op    17.70 ns/op
PNP     4.89 ns/op     8.94 ns/op
NPP     2.93 ns/op    15.20 ns/op
NNP     2.93 ns/op    14.30 ns/op
NPN     1.00 ns/op    21.00 ns/op
PNN     2.93 ns/op    15.00 ns/op
PPZ     2.92 ns/op     9.18 ns/op
NPZ     2.91 ns/op    14.90 ns/op
PNZ     2.92 ns/op     8.85 ns/op
NNZ     2.93 ns/op    14.30 ns/op
PPY     2.96 ns/op     2.94 ns/op
NPY     2.92 ns/op     8.74 ns/op
PNY     2.92 ns/op     2.95 ns/op
NNY     2.93 ns/op     9.05 ns/op

(go version devel +11eaf42 Sat Apr 29 04:15:49 2017 +0000 windows/amd64)

tzneal · 2017-04-29T16:42:55Z

I'm working on a change that allows const indexed small arrays to be SSA'd. I'm not 100% happy with the approach (storing a flag on the array type instead of on the node to indicate that it was variably indexed), so I'm taking another look but it does work and will solve this:

goos: linux
goarch: amd64
BenchmarkStruct-8   	2000000000	         0.78 ns/op
BenchmarkArray-8    	2000000000	         0.78 ns/op
PASS
ok  	_/tmp/foo	3.277s

egonelbre · 2017-04-30T09:21:30Z

@tzneal wouldn't it be possible to rewrite variably indexed statements into switches? Just as a proof of concept: https://play.golang.org/p/m0n1E7Q2kg (I'm sure there is a way to avoid the performance hit introduced by the panic, when you have access to codegen)

BenchmarkStruct-8               2000000000               0.98 ns/op
BenchmarkArray-8                50000000                29.7 ns/op
BenchmarkArray2-8               100000000               22.0 ns/op
BenchmarkArray3-8               100000000               19.9 ns/op
BenchmarkArrayStruct-8          100000000               19.5 ns/op
BenchmarkArrayStruct3-8         200000000                6.91 ns/op

stuartcarnie · 2018-02-15T15:16:52Z

I tested a scenario using var acc [4]float vs var acc0, acc1, acc2, acc3 float64 and the 4-element array does not keep the accumulator values in registers.

In the following version, the accumulator values always write back to the stack in the loop

func SumFloat64Unroll_array(buf []float64) float64 {
	var (
		acc [4]float64
	)
	for i := 0; i < len(buf); i += 4 {
		bb := (*[4]float64)(unsafe.Pointer(&buf[i]))
		acc[0] += bb[0]
		acc[1] += bb[1]
		acc[2] += bb[2]
		acc[3] += bb[3]
	}
	return acc[0] + acc[1] + acc[2] + acc[3]
}

per

  movsd (CX)(DX*8), X0
  addsd "".acc(SP), X0
  movsd X0, "".acc(SP)
  movsd 8(CX)(DX*8), X0
  addsd "".acc+8(SP), X0
  movsd X0, "".acc+8(SP)
  movsd 16(CX)(DX*8), X0
  addsd "".acc+16(SP), X0
  movsd X0, "".acc+16(SP)
  movsd 24(CX)(DX*8), X0
  addsd "".acc+24(SP), X0
  movsd X0, "".acc+24(SP)

vs this version, each accumulator remains in a register for the loop

func SumFloat64Unroll_vars(buf []float64) float64 {
	var (
		acc0, acc1, acc2, acc3 float64
	)
	for i := 0; i < len(buf); i += 4 {
		bb := (*[4]float64)(unsafe.Pointer(&buf[i]))
		acc0 += bb[0]
		acc1 += bb[1]
		acc2 += bb[2]
		acc3 += bb[3]
	}
	return acc0 + acc1 + acc2 + acc3
}

per

  movsd (CX)(DX*8), X4
  addsd X4, X0
  movsd 8(CX)(DX*8), X4
  addsd X4, X1
  movsd 16(CX)(DX*8), X4
  addsd X4, X2
  movsd 24(CX)(DX*8), X4
  addsd X4, X3

gopherbot · 2018-04-29T12:54:47Z

Change https://golang.org/cl/106495 mentions this issue: cmd/compile: add some generic composite type optimizations

Propagate values through some wide Zero/Move operations. Among other things this allows us to optimize some kinds of array initialization. For example, the following code no longer requires a temporary be allocated on the stack. Instead it writes the values directly into the return value. func f(i uint32) [4]uint32 { return [4]uint32{i, i+1, i+2, i+3} } The return value is unnecessarily cleared but removing that is probably a task for dead store analysis (I think it needs to be able to match multiple Store ops to wide Zero ops). In order to reliably remove stack variables that are rendered unnecessary by these new rules I've added a new generic version of the unread autos elimination pass. These rules are triggered more than 5000 times when building and testing the standard library. Updates #15925 (fixes for arrays of up to 4 elements). Updates #24386 (fixes for up to 4 kept elements). Updates #24416. compilebench results: name old time/op new time/op delta Template 353ms ± 5% 359ms ± 3% ~ (p=0.143 n=10+10) Unicode 219ms ± 1% 217ms ± 4% ~ (p=0.740 n=7+10) GoTypes 1.26s ± 1% 1.26s ± 2% ~ (p=0.549 n=9+10) Compiler 6.00s ± 1% 6.08s ± 1% +1.42% (p=0.000 n=9+8) SSA 15.3s ± 2% 15.6s ± 1% +2.43% (p=0.000 n=10+10) Flate 237ms ± 2% 240ms ± 2% +1.31% (p=0.015 n=10+10) GoParser 285ms ± 1% 285ms ± 1% ~ (p=0.878 n=8+8) Reflect 797ms ± 3% 807ms ± 2% ~ (p=0.065 n=9+10) Tar 334ms ± 0% 335ms ± 4% ~ (p=0.460 n=8+10) XML 419ms ± 0% 423ms ± 1% +0.91% (p=0.001 n=7+9) StdCmd 46.0s ± 0% 46.4s ± 0% +0.85% (p=0.000 n=9+9) name old user-time/op new user-time/op delta Template 337ms ± 3% 346ms ± 5% ~ (p=0.053 n=9+10) Unicode 205ms ±10% 205ms ± 8% ~ (p=1.000 n=10+10) GoTypes 1.22s ± 2% 1.21s ± 3% ~ (p=0.436 n=10+10) Compiler 5.85s ± 1% 5.93s ± 0% +1.46% (p=0.000 n=10+8) SSA 14.9s ± 1% 15.3s ± 1% +2.62% (p=0.000 n=10+10) Flate 229ms ± 4% 228ms ± 6% ~ (p=0.796 n=10+10) GoParser 271ms ± 3% 275ms ± 4% ~ (p=0.165 n=10+10) Reflect 779ms ± 5% 775ms ± 2% ~ (p=0.971 n=10+10) Tar 317ms ± 4% 319ms ± 5% ~ (p=0.853 n=10+10) XML 404ms ± 4% 409ms ± 5% ~ (p=0.436 n=10+10) name old alloc/op new alloc/op delta Template 34.9MB ± 0% 35.0MB ± 0% +0.26% (p=0.000 n=10+10) Unicode 29.3MB ± 0% 29.3MB ± 0% +0.02% (p=0.000 n=10+10) GoTypes 115MB ± 0% 115MB ± 0% +0.30% (p=0.000 n=10+10) Compiler 519MB ± 0% 521MB ± 0% +0.30% (p=0.000 n=10+10) SSA 1.55GB ± 0% 1.57GB ± 0% +1.34% (p=0.000 n=10+9) Flate 24.1MB ± 0% 24.2MB ± 0% +0.10% (p=0.000 n=10+10) GoParser 28.1MB ± 0% 28.1MB ± 0% +0.07% (p=0.000 n=10+10) Reflect 78.7MB ± 0% 78.7MB ± 0% +0.03% (p=0.000 n=8+10) Tar 34.4MB ± 0% 34.5MB ± 0% +0.12% (p=0.000 n=10+10) XML 43.2MB ± 0% 43.2MB ± 0% +0.13% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 330k ± 0% 330k ± 0% -0.01% (p=0.017 n=10+10) Unicode 337k ± 0% 337k ± 0% +0.01% (p=0.000 n=9+10) GoTypes 1.15M ± 0% 1.15M ± 0% +0.03% (p=0.000 n=10+10) Compiler 4.77M ± 0% 4.77M ± 0% +0.03% (p=0.000 n=9+10) SSA 12.5M ± 0% 12.6M ± 0% +1.16% (p=0.000 n=10+10) Flate 221k ± 0% 221k ± 0% +0.05% (p=0.000 n=9+10) GoParser 275k ± 0% 275k ± 0% +0.01% (p=0.014 n=10+9) Reflect 944k ± 0% 944k ± 0% -0.02% (p=0.000 n=10+10) Tar 324k ± 0% 323k ± 0% -0.12% (p=0.000 n=10+10) XML 384k ± 0% 384k ± 0% -0.01% (p=0.001 n=10+10) name old object-bytes new object-bytes delta Template 476kB ± 0% 476kB ± 0% -0.04% (p=0.000 n=10+10) Unicode 218kB ± 0% 218kB ± 0% ~ (all equal) GoTypes 1.58MB ± 0% 1.58MB ± 0% -0.04% (p=0.000 n=10+10) Compiler 6.25MB ± 0% 6.24MB ± 0% -0.09% (p=0.000 n=10+10) SSA 15.9MB ± 0% 16.1MB ± 0% +1.22% (p=0.000 n=10+10) Flate 304kB ± 0% 304kB ± 0% -0.13% (p=0.000 n=10+10) GoParser 370kB ± 0% 370kB ± 0% -0.00% (p=0.000 n=10+10) Reflect 1.27MB ± 0% 1.27MB ± 0% -0.12% (p=0.000 n=10+10) Tar 421kB ± 0% 419kB ± 0% -0.64% (p=0.000 n=10+10) XML 518kB ± 0% 517kB ± 0% -0.12% (p=0.000 n=10+10) name old export-bytes new export-bytes delta Template 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) Unicode 6.52kB ± 0% 6.52kB ± 0% ~ (all equal) GoTypes 29.2kB ± 0% 29.2kB ± 0% ~ (all equal) Compiler 88.0kB ± 0% 88.0kB ± 0% ~ (all equal) SSA 109kB ± 0% 109kB ± 0% ~ (all equal) Flate 4.49kB ± 0% 4.49kB ± 0% ~ (all equal) GoParser 8.10kB ± 0% 8.10kB ± 0% ~ (all equal) Reflect 7.71kB ± 0% 7.71kB ± 0% ~ (all equal) Tar 9.15kB ± 0% 9.15kB ± 0% ~ (all equal) XML 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 676kB ± 0% 672kB ± 0% -0.59% (p=0.000 n=10+10) CmdGoSize 7.26MB ± 0% 7.24MB ± 0% -0.18% (p=0.000 n=10+10) name old data-bytes new data-bytes delta HelloSize 10.2kB ± 0% 10.2kB ± 0% ~ (all equal) CmdGoSize 248kB ± 0% 248kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) CmdGoSize 145kB ± 0% 145kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.46MB ± 0% 1.45MB ± 0% -0.31% (p=0.000 n=10+10) CmdGoSize 14.7MB ± 0% 14.7MB ± 0% -0.17% (p=0.000 n=10+10) Change-Id: Ic72b0c189dd542f391e1c9ab88a76e9148dc4285 Reviewed-on: https://go-review.googlesource.com/106495 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>

…oat64 this is for performance reasons, see: - golang/go#15925

Small arrays are not SSA-able. By moving too regular variables these values are kept in registers rather than on the stack.. golang/go#15925 name old time/op new time/op delta Hash/100-8 20.2ns ± 3% 17.6ns ± 4% -12.89% (p=0.000 n=29+30) Hash/1024-8 109ns ± 2% 84ns ± 2% -23.51% (p=0.000 n=30+26) Hash/8192-8 824ns ± 3% 594ns ± 1% -27.98% (p=0.000 n=29+26) name old speed new speed delta Hash/100-8 4.95GB/s ± 2% 5.69GB/s ± 4% +14.86% (p=0.000 n=28+29) Hash/1024-8 9.35GB/s ± 3% 12.24GB/s ± 2% +30.88% (p=0.000 n=30+26) Hash/8192-8 9.94GB/s ± 3% 13.80GB/s ± 1% +38.83% (p=0.000 n=29+26)

randall77 added this to the Go1.8Early milestone Jun 1, 2016

randall77 added the Performance label Jun 1, 2016

josharian changed the title ~~cmd/compile: performance much worst with SSA~~ cmd/compile: SSA performance regression due to array zeroing Jun 3, 2016

dgryski mentioned this issue Jun 26, 2016

cmd/compile: go1.7beta2 performance regression (floats and arrays) #16187

Open

quentinmit added the NeedsFix label Oct 6, 2016

rsc modified the milestones: Go1.8, Go1.8Early Oct 20, 2016

rsc modified the milestones: Go1.9, Go1.8 Oct 21, 2016

rsc mentioned this issue Oct 21, 2016

cmd/compile: SSA performance regression on polygon code #15532

Open

navytux mentioned this issue Mar 6, 2017

cmd/compile: omit zeroing of named return value when possible #4750

Open

randall77 modified the milestones: Go1.10, Go1.9 Jun 6, 2017

bradfitz modified the milestones: Go1.10, Go1.11 Nov 28, 2017

bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018

egonelbre mentioned this issue Mar 24, 2020

r3.Vec as [3]float64 gonum/gonum#1297

Closed

sbinet added a commit to sbinet-hep/hep that referenced this issue Mar 25, 2020

all: introduce fmom.Vec4, make fmom.PxPyPzE a struct instead of [4]fl…

3a95542

…oat64 this is for performance reasons, see: - golang/go#15925

sbinet added a commit to sbinet-hep/hep that referenced this issue Mar 25, 2020

all: introduce fmom.Vec4, make fmom.PxPyPzE a struct instead of [4]fl…

1a8accc

…oat64 this is for performance reasons, see: - golang/go#15925

sbinet mentioned this issue Mar 25, 2020

all: introduce fmom.Vec4, make fmom.PxPyPzE a struct instead of [4]float64 go-hep/hep#594

Merged

sbinet mentioned this issue Jun 21, 2021

spatial/r3: use lightweight matrix for rotations gonum/gonum#1683

Merged

egonelbre mentioned this issue Nov 1, 2021

crypto: import fiat-crypto implementations #40171

Open

gopherbot added the compiler/runtime label Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: SSA performance regression due to array zeroing #15925

cmd/compile: SSA performance regression due to array zeroing #15925

ghost commented Jun 1, 2016 •

edited by ghost

Loading

cespare commented Jun 1, 2016

randall77 commented Jun 1, 2016

ghost commented Jun 1, 2016

randall77 commented Jun 1, 2016 •

edited

Loading

ghost commented Jun 2, 2016

rsc commented Oct 17, 2016

cherrymui commented Oct 17, 2016

rsc commented Oct 21, 2016

randall77 commented Oct 21, 2016

egonelbre commented Apr 29, 2017

tzneal commented Apr 29, 2017 •

edited

Loading

egonelbre commented Apr 30, 2017

stuartcarnie commented Feb 15, 2018 •

edited

Loading

gopherbot commented Apr 29, 2018

cmd/compile: SSA performance regression due to array zeroing #15925

cmd/compile: SSA performance regression due to array zeroing #15925

Comments

ghost commented Jun 1, 2016 • edited by ghost Loading

cespare commented Jun 1, 2016

randall77 commented Jun 1, 2016

ghost commented Jun 1, 2016

randall77 commented Jun 1, 2016 • edited Loading

ghost commented Jun 2, 2016

rsc commented Oct 17, 2016

cherrymui commented Oct 17, 2016

rsc commented Oct 21, 2016

randall77 commented Oct 21, 2016

egonelbre commented Apr 29, 2017

tzneal commented Apr 29, 2017 • edited Loading

egonelbre commented Apr 30, 2017

stuartcarnie commented Feb 15, 2018 • edited Loading

gopherbot commented Apr 29, 2018

ghost commented Jun 1, 2016 •

edited by ghost

Loading

randall77 commented Jun 1, 2016 •

edited

Loading

tzneal commented Apr 29, 2017 •

edited

Loading

stuartcarnie commented Feb 15, 2018 •

edited

Loading