Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: missed trivial bound check elimination #66826

Open
FiloSottile opened this issue Apr 14, 2024 · 7 comments
Open

cmd/compile: missed trivial bound check elimination #66826

FiloSottile opened this issue Apr 14, 2024 · 7 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@FiloSottile
Copy link
Contributor

Go version

go version go1.22.2 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/filippo/Library/Caches/go-build'
GOENV='/Users/filippo/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/filippo/pkg/mod'
GONOPROXY='github.com/FiloSottile/*,filippo.io/*'
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/filippo'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org'
GOROOT='/Users/filippo/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.2.darwin-arm64'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/filippo/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.2.darwin-arm64/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.2'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/filippo/src/filippo.io/mlkem768/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/_j/hq4ytn1n4b94fhrpvvb9tktr0000gn/T/go-build3556564162=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

I have this function in a fairly hot loop.

const n = 256
type fieldElement uint16
type nttElement [n]fieldElement

func nttMul(f, g nttElement) nttElement {
	var h nttElement
	for i := 0; i < 128; i++ {
		a0, a1 := f[2*i], f[2*i+1]
		b0, b1 := g[2*i], g[2*i+1]
		h[2*i] = fieldAdd(fieldMul(a0, b0), fieldMul(fieldMul(a1, b1), gammas[i]))
		h[2*i+1] = fieldAdd(fieldMul(a0, b1), fieldMul(a1, b0))
	}
	return h
}

What did you see happen?

Both f[2*i] and f[2*i+1] get a isInBounds(). g[2*i] and g[2*i+1] don't.

What did you expect to see?

One or zero bounds checks.

One if the compiler realized that if f[2*i+1] is in bounds with i positive and small (0 to 127), then f[2*i] is clearly in bounds.

Zero if the compiler figured out that i is 0 to 127, so 2*i+1 is 0 to 255, and the size of f is a constant 256.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 14, 2024
@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 15, 2024
@cherrymui cherrymui modified the milestones: Unplanned, Backlog Apr 15, 2024
@go101
Copy link

go101 commented Apr 16, 2024

It looks there is a workaround now:

func nttMul(f, g nttElement) nttElement {
	var h nttElement
	for i := 0; i < 256; i += 2 {
		a0, a1 := f[i], 
		f[i+1]
		b0, b1 := g[i], 
		g[i+1]
		_, _, _, _ = a0, a1, b0, b1
	}
	return h
}

Not benchmark it yet.

[edit] BTW, why not use slices as arguments:

func nttMul(f, g []fieldElement) nttElement {
	var h nttElement
	f, g = f[:256], g[:256]
	for i := 0; i < 256; i += 2 {
		a0, a1 := f[i], 
		f[i+1]
		b0, b1 := g[i], 
		g[i+1]
		_, _, _, _ = a0, a1, b0, b1
	}
	return h
}

@FiloSottile
Copy link
Contributor Author

That does indeed remove the bounds checks on f, but we get one on gammas[i/2], which also feels avoidable.

Slices should in theory add overhead compared to const size arrays and remove information from the type system

@go101
Copy link

go101 commented Apr 16, 2024

Yes, there are still several cases BCE fails to handle.

Now, a workaround is to double the length of gammas and only uses its elements at 2N indexes.

@FiloSottile
Copy link
Contributor Author

Now, a workaround is to double the length of gammas and only uses its elements at 2N indexes.

Interestingly, that benchmarks decidedly worse, eating between 10% and 20% of the speedup of dropping the f bounds checks. 🤷

@go101
Copy link

go101 commented Apr 16, 2024

It is possible. BCE will not always get positive effects.

Another way to remove all bound checks is declare fieldElement as [2]uint16. I'm not sure about the final effect.

@go101
Copy link

go101 commented Apr 16, 2024

The test shows that double length of gammas is the most performant: https://gotipplay.golang.org/p/VKJDqUPuhKV (maybe the implementation of fieldMul and fieldAdd will affect the results).

goos: linux
goarch: amd64
pkg: example.com
cpu: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Benchmark_f1-4   	 1762314	       680.7 ns/op
Benchmark_f2-4   	 3191974	       374.8 ns/op
Benchmark_f3-4   	 3488221	       343.2 ns/op
Benchmark_f4-4   	 1893159	       630.0 ns/op

@randall77
Copy link
Contributor

Simple reproducer:

func f(a [256]byte) {
	for i := 0; i < 128; i++ {
		_ = a[2*i]
	}
}

We should be able to get rid of the bounds check. Currently I think it fails because the prove pass can't tell that the math 2*i doesn't overflow and become negative. It does seem to grok that 2*i < 256.
Should be fixable. That code is delicate though, it is not immediately obvious to me where that fix would go.

FiloSottile added a commit to FiloSottile/mlkem768 that referenced this issue Apr 18, 2024
go: go1.22.2
goos: linux
goarch: amd64
pkg: filippo.io/mlkem768
cpu: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
                  │   e10936b   │           e10936b-dirty            │
                  │   sec/op    │   sec/op     vs base               │
KeyGen-4            65.46µ ± 1%   64.41µ ± 1%  -1.61% (p=0.000 n=20)
Encaps-4            98.63µ ± 1%   97.28µ ± 0%  -1.37% (p=0.000 n=20)
Decaps-4            98.16µ ± 0%   96.31µ ± 0%  -1.88% (p=0.000 n=20)
RoundTrip/Alice-4   172.4µ ± 0%   169.1µ ± 0%  -1.90% (p=0.000 n=20)
RoundTrip/Bob-4     98.76µ ± 0%   97.45µ ± 0%  -1.32% (p=0.000 n=20)
geomean             101.5µ        99.89µ       -1.62%

See golang/go#66826
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

5 participants