cmd/compile: benchmark performance regression in 1.9beta1 #20711

cespare · 2017-06-17T02:19:09Z

Sorry for the poor title -- I haven't done enough digging to know how to be more specific about the problem.

I'm comparing the following go versions:

go version go1.8 linux/amd64
go version go1.9beta1 linux/amd64

The following benchmark exhibits a significant slowdown:

package main

import (
	"testing"
)

type MPG []P

type P struct {
	p   uint16
	idx uint32
}

type Dim interface {
	g(mpg MPG, scale uint32)
}

type dim struct {
	cells []uint32
}

func (d *dim) g(mpg MPG, scale uint32) {
	for i, p := range mpg {
		mpg[i].idx += scale * uint32(d.cells[int(p.p)])
	}
}

func BenchmarkGM(b *testing.B) {
	const numRows = 50000
	cells := make([]uint32, numRows)
	var dim Dim = &dim{cells: cells}
	for i := range cells {
		cells[i] = uint32(i % 2)
	}
	mpg := make(MPG, numRows)
	for i := range mpg {
		mpg[i].p = uint16(i)
	}
	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		dim.g(mpg, 5)
	}
}

name  old time/op  new time/op  delta
GM-4  59.6µs ± 0%  71.4µs ± 1%  +19.80%  (p=0.008 n=5+5)

This is extracted from a big suite of dozens of similar benchmarks in an internal codebase. Most of these benchmarks stayed the same or got faster in Go 1.9beta1, but a few got slower. I know that low-level codegen optimizations often make many benchmarks speed up while causing a few to slow down, but I thought it would be good to bring an example slowdown here in case it's not one of those. (Overall we're very happy with performance changes in 1.8->1.9beta1.)

The text was updated successfully, but these errors were encountered:

bradfitz · 2017-06-17T06:12:01Z

/cc @randall77 @josharian @mdempsky

josharian · 2017-06-17T23:05:38Z

Any chance of a bisection?

cespare · 2017-06-18T08:40:40Z

Yeah, I'll do one in the next few days.

gopherbot · 2017-06-20T20:38:11Z

CL https://golang.org/cl/46134 mentions this issue.

randall77 · 2017-06-21T18:30:59Z

Looks like there is an additional bounds check at tip. Here's a simple repro:

func f(a []int) {
	for i, x := range a {
		a[i] += x
	}
}

For 1.8.3, it generates an inner loop of

	0x0017 00023 (/home/khr/go/tmp2.go:5)	MOVQ	(DX)(BX*8), DI
	0x001b 00027 (/home/khr/go/tmp2.go:5)	ADDQ	DI, SI
	0x001e 00030 (/home/khr/go/tmp2.go:5)	MOVQ	SI, (DX)(BX*8)
	0x0022 00034 (/home/khr/go/tmp2.go:4)	ADDQ	$8, AX
	0x0026 00038 (/home/khr/go/tmp2.go:4)	INCQ	BX
	0x0029 00041 (/home/khr/go/tmp2.go:4)	CMPQ	BX, CX
	0x002c 00044 (/home/khr/go/tmp2.go:4)	JLT	$0, 20

For tip, it generates

	0x0023 00035 (/home/khr/go/tmp2.go:5)	CMPQ	BX, AX           <- extra bounds
	0x0026 00038 (/home/khr/go/tmp2.go:5)	JCC	72                           <- check here
	0x0028 00040 (/home/khr/go/tmp2.go:5)	MOVQ	(DX)(BX*8), DI
	0x002c 00044 (/home/khr/go/tmp2.go:5)	ADDQ	DI, SI
	0x002f 00047 (/home/khr/go/tmp2.go:5)	MOVQ	SI, (DX)(BX*8)
	0x0033 00051 (/home/khr/go/tmp2.go:4)	ADDQ	$8, CX
	0x0037 00055 (/home/khr/go/tmp2.go:4)	INCQ	BX
	0x003a 00058 (/home/khr/go/tmp2.go:4)	CMPQ	BX, AX
	0x003d 00061 (/home/khr/go/tmp2.go:4)	JLT	32

@dr2chase , I think this happens because of your change to do the additional len>0 comparison wrapping the loop. Instead of the loop exit check dominating the bounds check (as it does in 1.8), there are two predecessors of the bounds check block, each which does a comparison (one is the len>0 block, the other is the loop exit check). Only by combining the information from the two predecessors can we deduce that the bounds check is unnecessary.

Merging such information would be hard. Maybe there's another way to achieve the same result? And what would be the downside of backing out your change?

cespare · 2017-06-21T18:37:18Z

Would my doing a bisect still be helpful? Meant to do it earlier but it has been an extremely busy week for me.

dr2chase · 2017-06-21T18:45:50Z

Backing out the change fully creates a problem for GOEXPERIMENT=preemptibleloops; one possibility is to instead make the loop change conditional on whether the experiment is turned on (but we expect to turn it on for 1.10). Is including both copies of the loop-translating code and switching based on goexperiment considered adequately low-risk?

randall77 · 2017-06-21T18:52:47Z

@dr2chase Is preemptable loops a problem because the preempt check comes after the pointer increment?

@cespare: I don't think we need a bisect. Thanks though. I just mailed a demonstration CL that I think shows we're on the right track.

gopherbot · 2017-06-21T18:55:03Z

CL https://golang.org/cl/46331 mentions this issue.

randall77 · 2017-06-21T18:59:16Z

@dr2chase : I have no problem including two versions of the loop-translating code and switching on goexperiment. It all depends on how big the CL is and what confidence we have in it.
I would hope we have high confidence in the old code, if we could just splice that in when not under experiment, that would be ideal.

dr2chase · 2017-06-21T19:04:49Z

@randall77 Exactly that. And given how tricky it was to update the SSA graph to include the preemption check, I wasn't willing to put it anywhere other than the back edge, though perhaps for 1.10 we should get braver to reduce the cost of actually turning preempable loops on all the time.

dr2chase · 2017-06-21T20:09:45Z

If you haven't started on it, I just did, should have a CL up as soon as it passes tests locally.

cespare · 2017-06-21T21:09:33Z

Thanks very much @dr2chase and @randall77.

gopherbot · 2017-06-21T21:24:45Z

CL https://golang.org/cl/46410 mentions this issue.

Currently we only use 1 and 4 as a scale for indexed 4-byte load. In code generated in #20711 we can use indexed load with scale=8, to improve performance: name old time/op new time/op delta GM-6 108µs ± 0% 95µs ± 0% -12.06% (p=0.000 n=10+10) So add new ops and combine loadidx1(shift 3..).. into loadidx8, same for stores. Change-Id: I5ed1c250ac40960e20606580cf9de221e75b72f1 Reviewed-on: https://go-review.googlesource.com/46134 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>

cespare added this to the Go1.9 milestone Jun 17, 2017

bradfitz added Performance NeedsInvestigation labels Jun 17, 2017

gopherbot closed this as completed in 0b6fbaa Jun 21, 2017

golang locked and limited conversation to collaborators Jun 21, 2018

gopherbot added the FrozenDueToAge label Jun 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: benchmark performance regression in 1.9beta1 #20711

cmd/compile: benchmark performance regression in 1.9beta1 #20711

cespare commented Jun 17, 2017

bradfitz commented Jun 17, 2017

josharian commented Jun 17, 2017

cespare commented Jun 18, 2017

gopherbot commented Jun 20, 2017

randall77 commented Jun 21, 2017

cespare commented Jun 21, 2017

dr2chase commented Jun 21, 2017

randall77 commented Jun 21, 2017

gopherbot commented Jun 21, 2017

randall77 commented Jun 21, 2017

dr2chase commented Jun 21, 2017

dr2chase commented Jun 21, 2017

cespare commented Jun 21, 2017

gopherbot commented Jun 21, 2017

cmd/compile: benchmark performance regression in 1.9beta1 #20711

cmd/compile: benchmark performance regression in 1.9beta1 #20711

Comments

cespare commented Jun 17, 2017

bradfitz commented Jun 17, 2017

josharian commented Jun 17, 2017

cespare commented Jun 18, 2017

gopherbot commented Jun 20, 2017

randall77 commented Jun 21, 2017

cespare commented Jun 21, 2017

dr2chase commented Jun 21, 2017

randall77 commented Jun 21, 2017

gopherbot commented Jun 21, 2017

randall77 commented Jun 21, 2017

dr2chase commented Jun 21, 2017

dr2chase commented Jun 21, 2017

cespare commented Jun 21, 2017

gopherbot commented Jun 21, 2017