Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: potential compiler regression in 1.16 with heavy arithmetic #43971

Closed
zx2c4 opened this issue Jan 28, 2021 · 2 comments
Closed

cmd/compile: potential compiler regression in 1.16 with heavy arithmetic #43971

zx2c4 opened this issue Jan 28, 2021 · 2 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance

Comments

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 28, 2021

I just noticed a performance regression in a small benchmark with 1.16. Here's the code in question:

package test

import (
        "crypto/rand"
        "testing"
)

func BenchmarkBitslice(b *testing.B) {
        var key [32]byte
        var hex [64]byte
        rand.Read(key[:])

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                for j := 0; j < len(key); j++ {
                        hex[j*2] = byte(87 + (int(key[j]) >> 4) + ((((int(key[j]) >> 4) - 10) >> 8) & ^38))
                        hex[j*2+1] = byte(87 + (int(key[j]) & 0xf) + ((((int(key[j]) & 0xf) - 10) >> 8) & ^38))
                }
        }
}

func BenchmarkLookuptable(b *testing.B) {
        var key [32]byte
        var hex [64]byte
        rand.Read(key[:])

        b.ResetTimer()
        const table = "0123456789abcdef"
        for i := 0; i < b.N; i++ {
                for j := 0; j < len(key); j++ {
                        hex[j*2] = table[key[j]>>4]
                        hex[j*2+1] = table[key[j]&0xf]
                }
        }
}

I'm benching on a Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz using go test -v -benchtime 5s -bench Benchmark*.

Go 1.15:

BenchmarkBitslice-16            408505777               14.4 ns/op
BenchmarkLookuptable-16         389753908               15.6 ns/op

Go 1.16:

BenchmarkBitslice-16            385976983               15.36 ns/op
BenchmarkLookuptable-16         418508672               14.34 ns/op

The bitsliced performance got worse and the lookup table performance got better.

I'm not overly concerned about this, but it is somewhat curious. CC @josharian @bradfitz

@seankhliao seankhliao changed the title potential compiler regression in 1.16 with heavy arithmetic cmd/compile: potential compiler regression in 1.16 with heavy arithmetic Jan 28, 2021
@seankhliao seankhliao added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Jan 28, 2021
@ALTree
Copy link
Member

ALTree commented Jan 28, 2021

It seems like this is just a benchmarking artefact. I've run with -benchtime 2s -count 10 on my machine and compared the results with benchstat, and I got the opposite result (Lookuptable got <1ns slower).

name           old time/op  new time/op  delta
Bitslice-4     27.2ns ± 3%  27.0ns ± 4%    ~     (p=0.565 n=10+10)
Lookuptable-4  26.4ns ± 4%  27.2ns ± 5%  +2.70%  (p=0.050 n=10+10)

But it may be caused by different code alignment. More importantly, you can compare the machine code generated by 1.15 and 1.16 using

$ go test -gcflags -S p_test.go

And I see no meaningful difference between the two, which seems to support the "it's probably just benchmarking noise" explanation. You're measuring a <1ns difference, after all.

@randall77
Copy link
Contributor

The assembly for BitSlice looks identical to me. The only difference I see is the offset used to load b.N.
I'm going to close this issue, as I don't think there is anything to do here.

@golang golang locked and limited conversation to collaborators Jan 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

5 participants