bytes: relatively poor performance in bytes.Compare #62648

pconstantinou · 2023-09-14T18:33:54Z

What version of Go are you using (`go version`)?

1.21

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/Users/phil/Library/Caches/go-build'
GOENV='/Users/phil/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/phil/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/phil/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_amd64'
GOVCS=''
GOVERSION='go1.21.0'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/2w/sj4gll256rqfs6r4sgwcn3lh0000gn/T/go-build2454689691=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

For fixed length arrays (such as github.com/google/uuid.UUID) bytes.Compare perform poorly relative to a handwritten:

func uuidCmpByte(a, b uuid.UUID) int {
	for i := 0; i < 16; i++ {
		if d := int(a[i]) - int(b[i]); d != 0 {
			return d
		}
	}
	return 0
}

Bench results where there's a

BenchmarkBytesCompare-8   	     262	   4459605 ns/op	       0 B/op	       0 allocs/op
BenchmarkArrayCompare-8   	     890	   1437192 ns/op	       0 B/op	       0 allocs/op

What did you expect to see?

I expected the internal bytes.Compare to perform optimally.

What did you see instead?

Probably due to the conversion a slice and the inability to unroll the loop, bytes.Compare performed about 3x worse.

It would be ideal to have a function that would compare fixed-length arrays optimally.


const sliceSize = 1000

func uuidCmpByte(a, b uuid.UUID) int {
	for i := 0; i < 16; i++ {
		if d := int(a[i]) - int(b[i]); d != 0 {
			return d
		}
	}
	return 0
}

func BenchmarkBytesCompare(b *testing.B) {
	uuids := make([]uuid.UUID, 0, sliceSize)
	for i := 0; i < sliceSize; i++ {
		uuids = append(uuids, uuid.New())
	}
	for _, uid := range uuids[:10] {
		uuids = append(uuids, uid)
	}
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		for _, a := range uuids {
			for _, b := range uuids {
				_ = bytes.Compare(a[:], b[:])
			}
		}

	}
}

func BenchmarkArrayCompare(b *testing.B) {
	uuids := make([]uuid.UUID, 0, sliceSize)
	for i := 0; i < sliceSize; i++ {
		uuids = append(uuids, uuid.New())
	}
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		for _, a := range uuids {
			for _, b := range uuids {
				_ = uuidCmpByte(a, b)
			}
		}
	}
}

The text was updated successfully, but these errors were encountered:

ALTree · 2023-09-14T19:29:46Z

Your compare function is not strictly equivalent to bytes.Compare because the latter guarantees a return value of -1, 0, or +1, while yours just returns the diff. If I change it to follow the bytes.Compare interface:

func uuidCmpByte2(a, b uuid.UUID) int {
	for i := 0; i < 16; i++ {
		d := int(a[i]) - int(b[i])
		if d < 0 {
			return -1
		} else if d > 0 {
			return 1
		}
	}
	return 0
}

I get:

name             time/op
BytesCompare-4   5.03ms ± 1%
ArrayCompare-4   1.59ms ± 1%
ArrayCompare2-4  5.08ms ± 2%

randall77 · 2023-09-14T19:30:33Z

This largely has to do with how different the UUIDs are.
When they differ on the first byte, then your hand-written comparison will be really fast because it exits the loop immediately.
When they differ later in the UUID, or are in fact equal, then the bytes version will be faster. I measure more than a factor of 2 faster. (e.g. change from uuid.New to uuid.UUID{}.)

Of course, it would be nice to have the best of both worlds. Maybe there is some early-exit things we can do to improve the obviously-different case.

pconstantinou · 2023-09-14T19:52:51Z

I was also hoping to find a way to cast (without a performance hit) the [16]byte into [2]uint64 thinking that would improve things. I figure the internal implementation of bytes.Compare did that. I'd assumed that some of the performance gains came from being able to unroll the loop.

I agree that the ratio of hits to misses is influential. Thanks for taking a look.

klauspost · 2023-09-15T15:37:24Z

I was also hoping to find a way to cast (without a performance hit) the [16]byte into [2]uint64

There is no penalty with binary.LittleEndian.Uint64(x[0:8]) on amd64.

See https://godbolt.org/z/hxbeYE5E9 - both parts are loaded with a single MOVQ and no bounds checks are needed.

shoham-b · 2024-01-02T08:33:20Z

This is because of the hand written function is inlined.

goos: darwin
goarch: arm64
pkg: go-playground/pkg
BenchmarkBytesCompare
BenchmarkBytesCompare-8                  	10990532	       106.1 ns/op
BenchmarkArrayCompare
BenchmarkArrayCompare-8                  	13332518	        89.93 ns/op
BenchmarkArrayCompareNoInline
BenchmarkArrayCompareNoInline-8          	 8318890	       138.7 ns/op
BenchmarkBytesCompareAllSame
BenchmarkBytesCompareAllSame-8           	 9992674	       119.0 ns/op
BenchmarkArrayCompareAllSame
BenchmarkArrayCompareAllSame-8           	 3212157	       371.1 ns/op
BenchmarkArrayCompareAllSameNoInline
BenchmarkArrayCompareAllSameNoInline-8   	 2865070	       487.5 ns/op

We use "compare_native" to compare the bytes using architeture specific optimizations, but ASM commands cannot be inlined.

If the UUIDs are different, the overhead of the call is more significant than when they are the same at which the loop is more prominent.

I don't think there is a good way to inline ASM instructions as of today.

ALTree changed the title ~~affected/package: bytes - relatively poor performance in bytes.~~ bytes: relatively poor performance in bytes.Compare Sep 14, 2023

seankhliao added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bytes: relatively poor performance in bytes.Compare #62648

bytes: relatively poor performance in bytes.Compare #62648

pconstantinou commented Sep 14, 2023 •

edited

ALTree commented Sep 14, 2023

randall77 commented Sep 14, 2023

pconstantinou commented Sep 14, 2023

klauspost commented Sep 15, 2023

shoham-b commented Jan 2, 2024 •

edited

bytes: relatively poor performance in bytes.Compare #62648

bytes: relatively poor performance in bytes.Compare #62648

Comments

pconstantinou commented Sep 14, 2023 • edited

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

ALTree commented Sep 14, 2023

randall77 commented Sep 14, 2023

pconstantinou commented Sep 14, 2023

klauspost commented Sep 15, 2023

shoham-b commented Jan 2, 2024 • edited

pconstantinou commented Sep 14, 2023 •

edited

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

shoham-b commented Jan 2, 2024 •

edited