cmd/compile: consider teaching prove about unexported integer fields #28952

mvdan · 2018-11-26T09:59:26Z

Consider the following program:

$ cat f.go
package p

type T struct {
	intOff  int
	uintOff uint
	data    []byte
}

func (t *T) reset(data []byte) {
	t.intOff = 0
	t.uintOff = 0
	t.data = data
}

func withIntVar(data []byte) {
	i := 0
	for i < len(data) {
		_ = data[i] // no bounds check
		i++
	}
}

func (t *T) withIntField(data []byte) {
	for t.intOff < len(t.data) {
		_ = t.data[t.intOff] // bounds check!
		t.intOff++
	}
}

func (t *T) withUintField(data []byte) {
	for t.uintOff < uint(len(t.data)) {
		_ = t.data[t.uintOff] // no bounds check
		t.uintOff++
	}
}
$ go version
go version devel +048c9164a0 Sat Nov 24 23:55:07 2018 +0000 linux/amd64
$ go build -gcflags=-d=ssa/check_bce/debug=1 f.go
# command-line-arguments
./f.go:25:13: Found IsInBounds

Lengths are signed, int is simple, and it's the default integer type, so it's understandable why most indexes and offsets are signed. This works fine in our first example; we get no bounds checks.

However, this breaks down when we start using struct field integers, if the loop logic is split between multiple methods. We can see that the second example has a bounds check. Only making the field unsigned is enough to convince prove/BCE that the offset is always within the bounds of the slice, presumably because in the signed integer field case the compiler isn't smart enough to figure out that t.intOff >= 0 in all cases.

For example, see https://go-review.googlesource.com/c/go/+/150919, where the JSON decoder gets ~1.5% faster by just making the offset field unsigned, but at the cost of adding more type conversions in a bunch of places.

I presume that we could prove that a signed integer field is never negative, granted that it's an unexported field and is only ever assigned non-negative values, such as:

t.field = N, where N is bounded to be non-negative
if t.field < N { t.field++ }, where N is bounded to be non-negative

The JSON example above does use d.off = len(d.data) + 1 to mark an EOF, which could get the field to overflow if len(d.data) is the maximum integer value, but I presume we could rewrite it to use d.off = len(data). Otherwise, all the assignments seem to keep the offset within non-negative bounds.

Ideally, the prove pass would be smart enough to also work with variations of these rules, such as the code below. But that could be tracked as a separate issue.

i := t.intOff
data := t.data
for i < len(data) {
    _ = data[i]
    i++
}
t.intOff = i

If there's a good reason why the prove pass can't ever deal with fields, please feel free to close this issue. I'm working with the assumption that it's possible, but hasn't been implemented yet.

/cc @aclements @rasky @josharian @bradfitz

The text was updated successfully, but these errors were encountered:

mvdan · 2018-11-26T10:02:19Z

The only disadvantage I can imagine with doing this is that exporting the field or adding a method that sets the offset to -1 could mean a confusing loss of performance for the user. But I think that's just how the prove pass currently works. If one modifies code to be too clever or to unintentionally break BCE, they can already see slow-downs due to added bounds checks.

josharian · 2018-11-26T14:32:09Z

We don’t do much cross-function (whole package) analysis, and it’s all before SSA: typechecking, escape analysis, inlining. Doing whole package analysis in SSA is possible but would make concurrent compilation much less effective. And it’s not really built for it. I’d say this is probably unlikely to happen.

mvdan · 2018-11-26T14:38:07Z

I imagined that the answer would be no, but I still think it's worth writing this down in an issue :) For example, I think this makes using unsigned integers for offset/index fields reasonable, if BCE matters.

cherrymui · 2018-11-26T15:55:01Z

Can the compiler hoist the bound check outside the loop, if the integer variable is only increasing and cannot overflow? In this case, if it panics it will panic on the first iteration, and if the first iteration doesn't panic, it won't panic.

This would be a local optimization, and I think it would bring most of the performance benefit.

mvdan · 2018-11-26T16:16:12Z

@cherrymui you mean rewriting the code to be like:

func (t *T) withIntField(data []byte) {
        if t.intOff < 0 {
                panic("out of bounds...")
        }
	for t.intOff < len(t.data) {
		_ = t.data[t.intOff] // no bounds check
		t.intOff++
	}
}

I'm not sure how that would work, though. If there's an actual out of bounds panic, what line would the panic point to? Presumably the original _ = t.data[t.intOff], even though the statement won't actually contain a bounds check.

Another workaround for users could be to do that transformation themselves, meaning that there wouldn't be a need for the field to be unsigned. That doesn't work at the moment, but could be a simple enough change to the prove pass, without needing to care about fields:

func withIntVar(i int, data []byte) {
        if i < 0 {
                panic("unreachable")
        }
        for i < len(data) {
                _ = data[i] // unexpected bounds check
                i++
        }
}

cherrymui · 2018-11-26T16:44:29Z

@mvdan yes. The compiler can annotate the panic as the same line number of _ = t.data[t.intOff]. In another word, the compiler compiles the statement _ = t.data[t.intOff] to two pieces of code with the same line number (which the compiler already do in various ways), one outside the loop, one inside the loop.

cherrymui · 2018-11-26T16:46:25Z

the original _ = t.data[t.intOff], even though the statement won't actually contain a bounds check.

Or say it this way: this statement does have a bound check (which is expected), just not in every iteration.

randall77 · 2018-11-26T16:48:49Z

This sounds eerily familiar:
https://dl.acm.org/citation.cfm?id=378850

mvdan · 2018-11-26T16:50:08Z

this statement does have a bound check (which is expected), just not in every iteration.

Sounds doable. I think I'll raise a separate issue about that non-field BCE failure I showed at the end of my last comment, as I don't think it's related to this issue or its potential fix.

mvdan · 2018-11-26T16:54:40Z

Separated that into #28956.

josharian · 2018-11-26T16:58:08Z

@cherrymui if the loop iterations have observable side effects, I think it’d be strange to panic partway through at a line number previous to the loop.

cherrymui · 2018-11-26T17:12:26Z

@josharian to be clear, I don't mean to do anything with line numbers. Currently the compiler generates a bound check, with the same line number, runs on every iteration. What I mean is that the compiler generates a bound check, with the same line number, runs only once.

Side effect is a good point, though. I can see it is harder to transform code like

	for t.intOff < len(t.data) {
		sideEffect() // can be a lot of code
		_ = t.data[t.intOff]
		t.intOff++
	}

It'd probably need to unroll the loop once, which is more complex and may or may not be beneficial when the iteration number is small.

mvdan added the Performance label Nov 26, 2018

mvdan added this to the Unplanned milestone Nov 26, 2018

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: consider teaching prove about unexported integer fields #28952

cmd/compile: consider teaching prove about unexported integer fields #28952

mvdan commented Nov 26, 2018

mvdan commented Nov 26, 2018

josharian commented Nov 26, 2018

mvdan commented Nov 26, 2018

cherrymui commented Nov 26, 2018

mvdan commented Nov 26, 2018

cherrymui commented Nov 26, 2018

cherrymui commented Nov 26, 2018

randall77 commented Nov 26, 2018

mvdan commented Nov 26, 2018

mvdan commented Nov 26, 2018

josharian commented Nov 26, 2018

cherrymui commented Nov 26, 2018

cmd/compile: consider teaching prove about unexported integer fields #28952

cmd/compile: consider teaching prove about unexported integer fields #28952

Comments

mvdan commented Nov 26, 2018

mvdan commented Nov 26, 2018

josharian commented Nov 26, 2018

mvdan commented Nov 26, 2018

cherrymui commented Nov 26, 2018

mvdan commented Nov 26, 2018

cherrymui commented Nov 26, 2018

cherrymui commented Nov 26, 2018

randall77 commented Nov 26, 2018

mvdan commented Nov 26, 2018

mvdan commented Nov 26, 2018

josharian commented Nov 26, 2018

cherrymui commented Nov 26, 2018