Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: intrinsify strings.Compare #61725

Closed
dsnet opened this issue Aug 2, 2023 · 25 comments
Closed

cmd/compile: intrinsify strings.Compare #61725

dsnet opened this issue Aug 2, 2023 · 25 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FeatureRequest NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Aug 2, 2023

A comment by @rsc in strings.Compare says:

This function does NOT call the runtime cmpstring function,
because we do not want to provide any performance justification for
using strings.Compare. Basically no one should use strings.Compare.
As the comment above says, it is here only for symmetry with package bytes.
If performance is important, the compiler should be changed to recognize
the pattern so that all code doing three-way comparisons, not just code
using strings.Compare, can benefit.

I'm not sure this comment has aged well with the recent change to slices.SortFunc,
where it now takes in an func(T, T) int instead of func(T, T) bool.
This means that the utility of strings.Compare suddenly shot up.

We should either intrinsify strings.Compare as runtime.cmpstring OR (even better) recognize this pattern.
That way, cmp.Compare on string types will benefit as well.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 2, 2023
@go101
Copy link

go101 commented Aug 3, 2023

@dr2chase dr2chase added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 3, 2023
@dr2chase
Copy link
Contributor

dr2chase commented Aug 3, 2023

@golang/compiler

@mknyszek mknyszek added this to the Backlog milestone Aug 9, 2023
@yuryu
Copy link
Contributor

yuryu commented Sep 26, 2023

Can we bring back this removed optimization? I'm hitting the same issue. slices.SortFunc needs a three-way comparison and there's no good way to do it right now.

https://go-review.googlesource.com/c/go/+/3012

@ncruces
Copy link
Contributor

ncruces commented Sep 28, 2023

Another vote for something that recognizes cmp.Compare and/or sequences of cmp.Less.

They seem to have been written in such a way that the optimizer can throw away all the NaN handling, yay! 🎉

But cmp.Compare gets particularly bad for strings: it doesn't even prioritize ==, like strings.Compare does.

Or should we also document cmp.Compare to say no one should call it? 🫤

@gopherbot
Copy link

Change https://go.dev/cl/532195 mentions this issue: strings: intrinsify and optimize Compare

@ncruces
Copy link
Contributor

ncruces commented Oct 3, 2023

Good. Yet, I'm a bit sad that this means giving up on Ordered having a decent implementation that's any good for strings.

It's a bit stunning to, for years, take the stance that you don't really need three-way compares, then build an entire 2nd search/sort library around three-way compares, and then fail to optimize the handful Ordered types for it, in effect pushing non-generic callbacks over operators for performance.

There are cases where three-way compares are not something to avoid, and leaving Ordered out of those is… sad.

@ncruces
Copy link
Contributor

ncruces commented Oct 3, 2023

If we're going to give up and not optimize string comparisons, and if we can't intrinsify/specialize cmp.Compare, I'd at least change the implementation to this:

func Compare[T Ordered](x, y T) int {
	if x == y {
		return 0
	}
	if x < y {
		return -1
	}
	if isNaN(x) {
		if isNaN(y) {
			return 0
		}
		return -1
	}
	return +1
}

It's slightly worse for ints, slightly better for floats (both hard to measure), and better for strings (mostly equivalent to the previous strings.Compare).

This is sorting a 10 million random element slice, with slices.SortFunc:

Benchmark_int_old-12            10000000               134.6 ns/op
Benchmark_int_new-12            10000000               138.5 ns/op
Benchmark_float_old-12          10000000               146.5 ns/op
Benchmark_float_new-12          10000000               143.4 ns/op
Benchmark_string_old-12         10000000               680.4 ns/op
Benchmark_string_new-12         10000000               596.4 ns/op
Benchmark_string_strings-12     10000000               588.2 ns/op

Edit: what would be a good benchmark so I could send this CL in?

ncruces added a commit to ncruces/go that referenced this issue Oct 3, 2023
Compare provides three-way comparison for Ordered types,
which certain algorithms benefit from.

Float comparisons require NaN handling that should be
optimized away for all other types.
String comparisons benefit from trying == first and
short-circuiting on different length strings.
Ideally the compiler would recognize the pattern and
specialize on strings, but this improves the status quo.

Benchmark shows:
- no change for ints,
- 20% improvement for floats, and
- 25% improvement for strings.

Bug: golang#61725
@gopherbot
Copy link

Change https://go.dev/cl/531997 mentions this issue: cmp: optimize Compare

@earthboundkid
Copy link
Contributor

earthboundkid commented Oct 12, 2023

With the existing slices.SortFunc and the coming cmp.Or, strings.Compare is more useful than it was pre-generics.

type Order struct {
	Product  string
	Customer string
	Price    float64
}
orders := []Order{ /* ... */ }
// ORDER BY Customer ASC, Product ASC, Price DESC
slices.SortFunc(orders, func(a, b Order) int {
	return cmp.Or(
		strings.Compare(a.Customer, b.Customer),  // could also be cmp.Compare if it can be equally optimized
		strings.Compare(a.Product, b.Product),
		cmp.Compare(b.Price, a.Price),
	)
})

@yuryu
Copy link
Contributor

yuryu commented Oct 12, 2023

I've updated examples to use cmp.Or as well as strings.Compare in http://go.dev/cl/532195. It's waiting for reviews.

@ncruces
Copy link
Contributor

ncruces commented Oct 12, 2023

I imagine you guys are fond of cmp.Or, but if we're discussing performance, I guess it should be noted that cmp.Or is not a short circuiting operator.

It's, curious that http://go.dev/cl/532195 changes ExampleSortFunc_multiField to use cmp.Or, but uses an if chain in BenchmarkSortFuncStruct to then claim performance.

PS: someone should go change package cmp's description, given cmp.Or has little to do with "comparing ordered values".

@yuryu
Copy link
Contributor

yuryu commented Oct 12, 2023

I just forgot to update the benchmark and didn't mean anything else. I can revert my change to not use cmp.Or later if cmp.Or is not short circuiting. It's a bit disappointing tbh because it looks very clean.

@ncruces
Copy link
Contributor

ncruces commented Oct 12, 2023

I'm all for optimizing strings.Compare and cmp.Compare, both as best as possible. And if strings.Compare can be made faster today, and cmp.Compare can't, so be it.

But I disagree with having package cmp push users to use strings.Compare for performance over clarity.

We go from one extreme (dissuading users from using strings.Compare by making it slow almost on purpose), to the other (recommending they go change every usage of cmp.Compare).

@earthboundkid
Copy link
Contributor

I agree that since cmp.Or isn’t short circuiting, it is a little weird to use it with strings.Compare to try to eke out a little performance that you just threw away. I still think it’s a pretty elegant way to write a struct comparison function though.

@aclements
Copy link
Member

@dr2chase , this may be an interesting thing to think about with the pure function analysis you've been experimenting with. If we knew or could tell that strings.Compare/cmp.Compare are pure, could we effectively compile a call to cmp.Or with a bunch of comparison arguments as a short circuited operation? Or maybe that whole approach is too implicit.

@aead
Copy link
Contributor

aead commented Nov 1, 2023

To summarize the situation:

Go does not compare strings in an optimal way. This is especially visible when sorting. As of now, the documentation in the sort packages incorrectly refers to the slices package to be more performant. In general, there are multiple ways to do the same thing and it is not clear/obvious what the performance differences are and which one to choose.

sort vs. slices

Consider the following benchmark:

func BenchmarkSort_10000(b *testing.B) {
	const N = 10000
	items := make([]string, N)
	for i := range items {
		items[i] = strings.Repeat("a", 100) + strconv.Itoa(i)
	}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		// slices.Sort(items)
		sort.Strings(items)
	}
}
goos: darwin
goarch: arm64
pkg: aead.dev/compare
             │ sort_slices.txt │          sort_strings.txt          │
             │     sec/op      │   sec/op     vs base               │
Sort_10000-8       95.26µ ± 3%   66.23µ ± 1%  -30.48% (p=0.002 n=6)

Using sort.Strings is ~30% faster than slices.Sort even though sort.Strings says:

// Strings sorts a slice of strings in increasing order.
//
// Note: consider using the newer slices.Sort function, which runs faster.

The perf. difference is caused by how slices.Sort compares strings. It uses a three-way comparison (x == y, x < y and y > x) while sort.Strings just uses a single less than (x < y).


slices.SortFunc

The same performance problem appears when using slices.SortFunc. Consider the following benchmark:

func BenchmarkSortFunc_10000(b *testing.B) {
	type Item struct {
		Value string
	}

	const N = 10000
	items := make([]*Item, N)
	for i := range items {
		items[i] = &Item{
			Value: strings.Repeat("a", 100) + strconv.Itoa(i),
		}
	}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		slices.SortFunc(items, func(a, b *Item) int {
			return strings.Compare(a.Value, b.Value)
		})
	}
}
goos: darwin
goarch: arm64
pkg: aead.dev/compare
                 │ sort_func_strings.txt │
                 │        sec/op         │
SortFunc_10000-8             94.39µ ± 1%

The combination of slices.SortFunc and strings.Compare is also ~30% slower than sort.Strings for the same reason as above. See strings.Compare

	// NOTE(rsc): This function does NOT call the runtime cmpstring function,
	// because we do not want to provide any performance justification for
	// using strings.Compare. Basically no one should use strings.Compare.
	// As the comment above says, it is here only for symmetry with package bytes.
	// If performance is important, the compiler should be changed to recognize
	// the pattern so that all code doing three-way comparisons, not just code
	// using strings.Compare, can benefit.

Further, representing strings as []byte slices and sorting them via bytes.Compare is faster than slices.Sort and slices.SortFunc` with strings:

func BenchmarkSortFunc_10000(b *testing.B) {
	type Item struct {
		Value []byte
	}

	const N = 10000
	items := make([]*Item, N)
	for i := range items {
		items[i] = &Item{
			Value: []byte(strings.Repeat("a", 100) + strconv.Itoa(i)),
		}
	}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		slices.SortFunc(items, func(a, b *Item) int {
			return bytes.Compare(a.Value, b.Value)
		})
	}
}
goos: darwin
goarch: arm64
pkg: aead.dev/compare
                 │ sort_func_strings.txt │        sort_func_bytes.txt         │
                 │        sec/op         │   sec/op     vs base               │
SortFunc_10000-8             94.39µ ± 1%   67.97µ ± 4%  -27.99% (p=0.002 n=6)

However, this is only the case when sorting existing []byte slices. Converting the strings to slices within the comparison function is significantly slower:

goos: darwin
goarch: arm64
pkg: aead.dev/compare
                 │ sort_func_strings.txt │       sort_func_castbytes.txt        │
                 │        sec/op         │    sec/op     vs base                │
SortFunc_10000-8             94.39µ ± 1%   555.39µ ± 1%  +488.38% (p=0.002 n=6)

bytes.Compare vs. strings.Compare vs. cmp.Compare

The generic cmp.Compare function is even slower than strings.Compare, and therefore, also slower than bytes.Compare.

goos: darwin
goarch: arm64
pkg: aead.dev/compare
                 │ sort_func_bytes.txt │       sort_func_strings.txt        │          sort_func_cmp.txt           │
                 │       sec/op        │   sec/op     vs base               │    sec/op     vs base                │
SortFunc_10000-8           67.97µ ± 4%   94.39µ ± 1%  +38.87% (p=0.002 n=6)   176.96µ ± 2%  +160.35% (p=0.002 n=6)

Possible solutions

While I agree with @rsc that the "best solution" would be the compiler recognizing the three-way comparison pattern for strings and emitting an memcmp/stringcmp intrinsic, I do not agree that strings.Compare should not be used and should be intentionally slower. A common use case for strings.Compare is sorting structs based on one of their inner fields:

type Person struct {
   Name string
   Age uint
}

The current situation is kind of unfortunate since:

  • sort.Strings is faster than slices.Sort while documentation recommends slices.Sort
  • there is no ergonomic way to use slices.SortFunc with strings that is as fast as sort.Strings
  • sorting strings represented as []byte slices (using slices.SortFunc and bytes.Compare) is faster than slices.Sort
  • using cmp.Compare is even slower than strings.Compare

As of now, the fastest way to sort a string slice is to use sort.Strings and there is no good solution for sorting structs based on one of their inner string fields using slices.SortFunc or sort.Slice. Both >= ~30% slower than sort.Strings.

@ncruces
Copy link
Contributor

ncruces commented Nov 1, 2023

We should make both as fast as possible, and stop recommending people use either because of performance-over-clarity.

@eliben
Copy link
Member

eliben commented Nov 7, 2023

To summarize the situation:

Go does not compare strings in an optimal way. This is especially visible when sorting. As of now, the documentation in the sort packages incorrectly refers to the slices package to be more performant. In general, there are multiple ways to do the same thing and it is not clear/obvious what the performance differences are and which one to choose.

sort vs. slices

[...]

I also want to add that I see some platform-dependent performance here; taking the correct string sorting benchmark from stdlib, I get better performance for slices.Sort on amd64 Linux (about 15% faster), but worse performance on arm64 darwin (about 50% slower).

@aead
Copy link
Contributor

aead commented Nov 7, 2023

I also want to add that I see some platform-dependent performance here

I suspect that this happens because the sorting implementation in slices is, in general, faster than sort.
With random strings, the difference between a single comparison and the three way comparison becomes blurry as well. The string comparison short-circuits on the first non-matching character. Hence, comparing random strings may be a good benchmark for sorting in general but it makes the performance penalty of comparing strings twice less visible.

Directly comparing strings.Compare and bytes.Compare using slices.SortFunc with the linked benchmark on arm64 darwin:

goos: darwin
goarch: arm64
pkg: aead.dev/sort
                    │ /tmp/bytes.txt │
                    │     sec/op     │
SlicesSortBytes-8        15.98m ± 1%
SlicesSortStrings-8      16.50m ± 0%
geomean                  16.24m

Sorting strings is slightly slower then sorting strings represented as []byte.

Adding a common prefix to all strings / byte slices increases the perf. difference:

goos: darwin
goarch: arm64
pkg: aead.dev/sort
                    │ /tmp/bytes.txt │
                    │     sec/op     │
SlicesSortBytes-8        28.20m ± 2%
SlicesSortStrings-8      33.96m ± 1%
geomean                  30.95m

@aead
Copy link
Contributor

aead commented Nov 7, 2023

A better benchmark for this issue may be:

func BenchmarkIsSorted_1000(b *testing.B) {
	const N = 1000
	items := make([]string, N)
	for i := range items {
		items[i] = strings.Repeat("a", 100) + strconv.Itoa(i)
	}
	slices.Sort(items)

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		// slices.IsSorted(items)
		sort.StringsAreSorted(items)
	}
}

The slices package uses cmp.Less while sort uses the Less function. The former compares two strings three times while the later compares them only once.

goos: darwin
goarch: arm64
pkg: aead.dev/sort
            │ /tmp/sort.txt │          /tmp/slices.txt           │
            │    sec/op     │   sec/op     vs base               │
Sort_1000-8     6.909µ ± 8%   9.573µ ± 2%  +38.56% (p=0.002 n=6)

@randall77
Copy link
Contributor

The former compares two strings three times while the later compares them only once.

This should no longer be true after https://go-review.googlesource.com/c/go/+/503795

@ncruces
Copy link
Contributor

ncruces commented Nov 7, 2023

It still compares them twice, and none of those 2 takes advantage of length being different to short circuit.

I tried to address that in https://go.dev/cl/531997 but results are inconsistent.

@randall77
Copy link
Contributor

It still compares them twice, and none of those 2 takes advantage of length being different to short circuit.

My point is that slices.Sort[string], which uses cmp.Less[string] under the hood, looks at each string once, not three times.
It is still true that slices.SortFunc[string], using cmp.Compare[string] as the comparison function, looks at each string twice (and not, as it was pre 503795, four times).

@go101
Copy link

go101 commented Nov 8, 2023

Note that, since toolchain v1.19, for unknown reasons, slice related benchmarks are often slice length sensitive. So it is best to use several different slices lengths in benchmarks.

@dmitshur dmitshur modified the milestones: Backlog, Go1.23 Apr 4, 2024
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Apr 4, 2024
@gopherbot
Copy link

Change https://go.dev/cl/578835 mentions this issue: cmd/compile: remove redundant calls to cmpstring

gopherbot pushed a commit that referenced this issue Apr 19, 2024
The results of cmpstring are reuseable if the second call has the
same arguments and memory.

Note that this gets rid of cmpstring, but we still generate a
redundant </<= test and branch afterwards, because the compiler
doesn't know that cmpstring only ever returns -1,0,1.

Update #61725

Change-Id: I93a0d1ccca50d90b1e1a888240ffb75a3b10b59b
Reviewed-on: https://go-review.googlesource.com/c/go/+/578835
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FeatureRequest NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
Development

No branches or pull requests