Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slices: add sorting and comparison functions #60091

Closed
ianlancetaylor opened this issue May 9, 2023 · 41 comments
Closed

slices: add sorting and comparison functions #60091

ianlancetaylor opened this issue May 9, 2023 · 41 comments

Comments

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented May 9, 2023

Update, May 18 2023: The current proposed API is #60091 (comment)


In #57433 we added the slices package to the standard library. In the discussion of that proposal, we decided to delay adding the sorting and comparison functions that exist in golang.org/x/exp/slices, pending a decision on the constraints package.

In #59488 we proposed adding a new cmp package, which will define cmp.Ordered as a constraint matching all ordered types.

This proposal is to add the sorting functions to the slices package in the standard library, assuming that #59488 is adopted. If #59488 is declined, then this proposal should be declined.

The proposed new API (already in golang.org/x/exp/slices) is:

// Compare compares the elements of s1 and s2.
// The elements are compared sequentially, starting at index 0,
// until one element is not equal to the other.
// The result of comparing the first non-matching elements is returned.
// If both slices are equal until one of them ends, the shorter slice is
// considered less than the longer one.
// The result is 0 if s1 == s2, -1 if s1 < s2, and +1 if s1 > s2.
// Comparisons involving floating point NaNs are ignored.
func Compare[E cmp.Ordered](s1, s2 []E) int

// CompareFunc is like Compare but uses a comparison function
// on each pair of elements. The elements are compared in increasing
// index order, and the comparisons stop after the first time cmp
// returns non-zero.
// The result is the first non-zero result of cmp; if cmp always
// returns 0 the result is 0 if len(s1) == len(s2), -1 if len(s1) < len(s2),
// and +1 if len(s1) > len(s2).
func CompareFunc[E1, E2 any](s1 []E1, s2 []E2, cmp func(E1, E2) int) int

// Sort sorts a slice of any ordered type in ascending order.
// Sort may fail to sort correctly when sorting slices of floating-point
// numbers containing Not-a-number (NaN) values.
// Use slices.SortFunc with an appropriate comparison function if the input may contain NaNs.
func Sort[E cmp.Ordered](x []E)

// SortFunc sorts the slice x in ascending order as determined by the less function.
// This sort is not guaranteed to be stable.
//
// SortFunc requires that less is a strict weak ordering.
// See https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings.
func SortFunc[E any](x []E, less func(a, b E) bool)

// SortStableFunc sorts the slice x while keeping the original order of equal
// elements, using less to compare elements.
func SortStableFunc[E any](x []E, less func(a, b E) bool)

// IsSorted reports whether x is sorted in ascending order.
func IsSorted[E cmp.Ordered](x []E) bool

// IsSortedFunc reports whether x is sorted in ascending order, with less as the
// comparison function.
func IsSortedFunc[E any](x []E, less func(a, b E) bool) bool

// BinarySearch searches for target in a sorted slice and returns the position
// where target is found, or the position where target would appear in the
// sort order; it also returns a bool saying whether the target is really found
// in the slice. The slice must be sorted in increasing order.
func BinarySearch[E cmp.Ordered](x []E, target E) (int, bool)

// BinarySearchFunc works like BinarySearch, but uses a custom comparison
// function. The slice must be sorted in increasing order, where "increasing"
// is defined by cmp. cmp should return 0 if the slice element matches
// the target, a negative number if the slice element precedes the target,
// or a positive number if the slice element follows the target.
// cmp must implement the same ordering as the slice, such that if
// cmp(a, t) < 0 and cmp(b, t) >= 0, then a must precede b in the slice.
func BinarySearchFunc[E, T any](x []E, target T, cmp func(E, T) int) (int, bool)
@gopherbot gopherbot added this to the Proposal milestone May 9, 2023
@ianlancetaylor
Copy link
Contributor Author

CC @eliben

@rsc
Copy link
Contributor

rsc commented May 10, 2023

These are the ones from x/exp/slices, which we've discussed at length before. The main blocker is how to spell 'cmp.Ordered'. For this discussion let's assume 'cmp.Ordered' is the spelling and not focus on that.

Are there any concerns about the APIs otherwise?

@willfaught
Copy link
Contributor

Sorry if this was explained elsewhere, but why isn't there a SortStable?

@eliben
Copy link
Member

eliben commented May 10, 2023

@willfaught see the paper trail from #47619 (comment)

@zephyrtronium
Copy link
Contributor

Perhaps nitpicking, but the comment on Sort regarding NaNs seems a bit prescriptive considering that we now have math.Compare and math.Compare32.

@ianlancetaylor
Copy link
Contributor Author

@zephyrtronium Just to be clear, are you suggesting this:

// Use slices.SortFunc(x, func(a, b float64) bool {return math.Compare(a, b) < 0 })

@zephyrtronium
Copy link
Contributor

More precisely, I am pointing out that the comment as written only says to use a < b || (math.IsNaN(a) && !math.IsNaN(b)), when math.Compare(a, b) < 0 is another way that exists. Each has different semantics and different advantages, but only one way is mentioned. Again, I think I am nitpicking, but it seems preferable to me to leave that sentence out of the documentation and leave the preferred approach(es) to examples.

@ianlancetaylor
Copy link
Contributor Author

Thanks. Updated the initial comment.

@rsc
Copy link
Contributor

rsc commented May 10, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@willfaught
Copy link
Contributor

willfaught commented May 11, 2023

see the paper trail from #47619 (comment)

@eliben This seems to be the relevant comment:

One caveat: we chose to exclude SortStable based on @dsnet 's comments; the version with constraints.Ordered is indistinguishable from Sort because two elements that sort equal are otherwise indistinguishable for practical purposes.

However, it doesn't link to the other comment it references. Can you summarize it? This seems to be saying that the Sort func already does a stable sort. If that's the case, then don't we need an unstable sort func as well?

@eliben
Copy link
Member

eliben commented May 11, 2023

However, it doesn't link to the other comment it references. Can you summarize it? This seems to be saying that the Sort func already does a stable sort. If that's the case, then don't we need an unstable sort func as well?

It refers to this comment: #47619 (comment) (GitHub's auto-hiding comments in long threads interferes with the bread crumbing). It has to do with what values Ordered actually covers and the logical equivalence of stable and unstable sort for such values. See https://pkg.go.dev/golang.org/x/exp/constraints#Ordered for all the types - what does "stable" sorting mean for such values?

@Merovius
Copy link
Contributor

I once again would like to advocate to use a single type of comparison function - i.e. to either make SortFunc accept a cmp function, or to make BinarySearchFunc accept a less function. Personally, I think the former is better, as implementing less in terms of cmp is cheaper than vice-versa.

Having two flavors of comparison function means you have to implement both per type, or you have to spell out a wrapper every time you need to go from one to the other, or you have to have a generic function that can transform one into the other (and the latter two probably add function call overhead). ISTM that ~any use of BinarySearchFunc has to rely on at least one use of Sort, so the use cases seem fairly coupled.

@eliben
Copy link
Member

eliben commented May 11, 2023

I once again would like to advocate to use a single type of comparison function - i.e. to either make SortFunc accept a cmp function, or to make BinarySearchFunc accept a less function. Personally, I think the former is better, as implementing less in terms of cmp is cheaper than vice-versa.

FWIW this dual use is carried over from the sort package (https://pkg.go.dev/sort), which has cmp for Find (the spiritual ancestor of BinarySearchFunc) and Less for Sort (https://pkg.go.dev/sort#Interface).

So it seems like a bit of a conundrum; having just cmp for slices would make it more internally consistent, but on the other hand would also make the contract subtly differ from the sort package for sorting, creating another kind of inconsistency.

@Merovius
Copy link
Contributor

Merovius commented May 11, 2023

@eliben IMO adding a new package, with generics no less, frees us from this historical burden, if we choose so.

@Merovius
Copy link
Contributor

Merovius commented May 11, 2023

Also, FWIW, golang.org/x/exp/slices.BinarySearchFunc explicitly moved from less to cmp, pretty much at the same time sort.Find was added. So that still wouldn't explain why two different flavors where used to begin with, if consistency was the goal ([edit] the proposal answers that question [/edit]).

@rsc
Copy link
Contributor

rsc commented May 15, 2023

I don't believe we should only provide the func versions. It happens frequently that you want to sort or search a slice by the usual < order provided by the language. Providing that directly without having to say ", cmp.Less" is more convenient and can be made far more efficient. This also follows other APIs in the language that provide a reasonable default behavior but then also have the extensible Func version, like strings.Index vs strings.IndexFunc.

@rsc
Copy link
Contributor

rsc commented May 15, 2023

Along these same lines, after the discussion of min and max on #59488, we should probably provide slices.Min and slices.Max here, alongside slices.Sort, since Min and Max are the other natural operations that would use Ordered.
The specific signatures would be:

func Min[E cmp.Ordered](x []E) E
func Max[E cmp.Ordered](x []E) E
func MinFunc[E any](x []T, less func(E, E) bool) E
func MaxFunc[E any](x []T, less func(E, E) bool) E

If these are invoked with an empty list, they panic("slices.Min: min of empty list") and so on.

@jimmyfrasche
Copy link
Member

What do they return when len(x) == 0?

@rsc
Copy link
Contributor

rsc commented May 15, 2023

Updated: they panic.

@magical
Copy link
Contributor

magical commented May 15, 2023

If we are going to add slices.Min/Max then i would suggest also including variants that return the index of the element instead of the element itself. (These are sometimes called argmax/argmin.)

// {Min,Max}Index{,Func} returns the index of the leftmost smallest/largest element of [slice],
// according to the < operator or less function.
// Panics if [slice] is empty.
func MinIndex[E cmp.Ordered](slice []E) int
func MaxIndex[E cmp.Ordered](slice []E) int
func MinIndexFunc[E any](slice []E, less func(E, E) bool) int
func MaxIndexFunc[E any](slice []E, less func(E, E) bool) int

Alternatively, maybe Min/Max could return both the element and its index?

@eliben
Copy link
Member

eliben commented May 16, 2023

Along these same lines, after the discussion of min and max on #59488, we should probably provide slices.Min and slices.Max here, alongside slices.Sort, since Min and Max are the other natural operations that would use Ordered. The specific signatures would be:

func Min[E cmp.Ordered](x []E) E
func Max[E cmp.Ordered](x []E) E
func MinFunc[E any](x []T, less func(E, E) bool) E
func MaxFunc[E any](x []T, less func(E, E) bool) E

If these are invoked with an empty list, they panic("slices.Min: min of empty list") and so on.

I'll look into adding these to exp/slices as a prototype for now

@mateusz834
Copy link
Member

@rsc
Small typo there, should be:

func MinFunc[E any](x []E, less func(E, E) bool) E
func MaxFunc[E any](x []E, less func(E, E) bool) E

@gopherbot
Copy link

Change https://go.dev/cl/495195 mentions this issue: slices: add Min/Max/MinFunc/MaxFunc

@eliben
Copy link
Member

eliben commented May 16, 2023

If we are going to add slices.Min/Max then i would suggest also including variants that return the index of the element instead of the element itself. (These are sometimes called argmax/argmin.)

// {Min,Max}Index{,Func} returns the index of the leftmost smallest/largest element of [slice],
// according to the < operator or less function.
// Panics if [slice] is empty.
func MinIndex[E cmp.Ordered](slice []E) int
func MaxIndex[E cmp.Ordered](slice []E) int
func MinIndexFunc[E any](slice []E, less func(E, E) bool) int
func MaxIndexFunc[E any](slice []E, less func(E, E) bool) int

Alternatively, maybe Min/Max could return both the element and its index?

IMHO Min and Max should return a single result, this makes them more usable in mathematical expressions. Moreover, I suspect that in the vast majority of cases users are interested in the min/max itself, not the index -- keeping only that will make these functions faster.

I don't object to adding Argmin/Argmax or *Index, but this can be done separately and maybe later as we assess demand.

@zephyrtronium
Copy link
Contributor

Personally, I'd prefer argmin and argmax over min and max themselves:

  • Going from argmin to min is simply s[slices.MinIndex(s)].
  • Argmin has a reasonable non-panic behavior for empty slices: return -1. If you use the above expression directly, you still get a panic. (Granted, checking for len(s) == 0 first also isn't hard.)
  • Anecdotally, I've needed the indices of extrema more often than their values. I think that is mostly from a time when I worked on scientific visualization software where it was useful to highlight the location of a maximum in a plot.

It isn't hard to imagine that I'm in the minority camp, though. 🌝

@rsc
Copy link
Contributor

rsc commented May 17, 2023

For concreteness, I believe this is the API being proposed:

package slices

func Compare[E cmp.Ordered](s1, s2 []E) int
func Sort[E cmp.Ordered](x []E)
func IsSorted[E cmp.Ordered](x []E) bool
func BinarySearch[E cmp.Ordered](x []E, target E) (int, bool)
func Min[E cmp.Ordered](x []E) E
func Max[E cmp.Ordered](x []E) E

func CompareFunc[E1, E2 any](s1 []E1, s2 []E2, cmp func(E1, E2) int) int
func SortFunc[E any](x []E, less func(a, b E) bool)
func SortStableFunc[E any](x []E, less func(a, b E) bool)
func IsSortedFunc[E any](x []E, less func(a, b E) bool) bool
func BinarySearchFunc[E, T any](x []E, target T, cmp func(E, T) int) (int, bool)
func MinFunc[E any](x []T, less func(E, E) bool) E
func MaxFunc[E any](x []T, less func(E, E) bool) E

There was a typo in the result of CompareFunc (it had no result), which I've fixed here and above.

@eliben
Copy link
Member

eliben commented May 17, 2023

@rsc there's a discussion in https://go-review.googlesource.com/c/exp/+/495195 about handling of NaNs in the newly added Min and Max functions.

Currently the Sort function in this package doesn't have any special NaN handling (since it's generic and written once for all Ordered types), so NaNs in a slice essentially lead to undefined sorting order (depending on their position, etc.) - with a comment:

// Sort may fail to sort correctly when sorting slices of floating-point
// numbers containing Not-a-number (NaN) values.
// Use slices.SortFunc(x, func(a, b float64) bool {return a < b || (math.IsNaN(a) && !math.IsNaN(b))}

A straightforward implementation of Min and Max will have the same issue, and may need a similar comment warning.

This could also be a topic for the proposal review committee to consider.

@eliben
Copy link
Member

eliben commented May 18, 2023

Thanks @rsc - I've updated the CL for x/exp/slices to follow this suggestion. I agree that distinguishing -0 vs. +0 in generic code will be tricky (math does it by checking Signbit).

@rsc
Copy link
Contributor

rsc commented May 19, 2023

Replying to @Merovius's suggestion about cmp vs less, yes, let's switch to cmp consistently throughout.
The updated API, then, is:

package slices

func Compare[E cmp.Ordered](s1, s2 []E) int
func Sort[E cmp.Ordered](x []E)
func IsSorted[E cmp.Ordered](x []E) bool
func BinarySearch[E cmp.Ordered](x []E, target E) (int, bool)
func Min[E cmp.Ordered](x []E) E
func Max[E cmp.Ordered](x []E) E

func CompareFunc[E1, E2 any](s1 []E1, s2 []E2, cmp func(E1, E2) int) int
func SortFunc[E any](x []E, cmp func(E, E) int)
func SortStableFunc[E any](x []E, cmp func(E, E) int)
func IsSortedFunc[E any](x []E, cmp func(E, E) int) bool
func BinarySearchFunc[E, T any](x []E, target T, cmp func(E, T) int) (int, bool)
func MinFunc[E any](x []E, cmp func(E, E) int) E
func MaxFunc[E any](x []E, cmp func(E, E) int) E

(Only the second half has changed.)

The argument in favor of cmp is consistency within package slices. Now if you are sorting a slice and then doing a binary search over it, you only need to write one comparison function and use it in both calls, instead of having to write two different ones.

The primary argument in favor of less is consistency with package sort. This argument doesn't hold up, for two reasons. First, if you are using a slice, you can do everything within package slices; you'll never import sort. Second, and more importantly, the less function signature used by sort is different from the one we were using here. Package sort's less takes indices, while package slices's less takes elements. So you can't use the same function with both packages, and knowing how to write a function for one package does not immediately mean you know how to write one for package slices. So this argument doesn't hold up.

A secondary argument in favor of less is that less functions tend to be shorter than cmp functions. That's true but being able to use the same function with all these different public APIs certainly beats that. If you were really attached to writing less functions you could write a generic converter. (Even so I wouldn't put one in the standard library since it would encourage inefficient code.)

The argument in favor of cmp is much more compelling than either of these, so consider it changed.

@dsnet
Copy link
Member

dsnet commented May 19, 2023

A secondary argument in favor of less is that less functions tend to be shorter than cmp functions.

This argument isn't all that compelling anymore, either. The main complexity of implementing cmp functions is the switch needed to produce ternary results. But with the addition of cmp.Compare, that becomes a single call to a generic helper function. More complex data structures will then be implemented in terms of cmp.Compare or slices.Compare, with complexity approximately equal to the less variant.

@gopherbot
Copy link

Change https://go.dev/cl/496078 mentions this issue: slices: add sorting and comparison functions

gopherbot pushed a commit that referenced this issue May 23, 2023
Now that the `cmp` package exists, sorting and comparison functions from
`x/exp/slices` can be ported to the standard library, using the
`cmp.Ordered` type and the `cmp.Less` and `cmp.Compare` functions.

This move also includes adjustments to the discussions in #60091 w.r.t.
NaN handling and cmp vs. less functions, and adds Min/Max functions.
The final API is taken from
#60091 (comment)

Updates #60091

Change-Id: Id7e6c88035b60d4ddd0c48dd82add8e8bc4e22d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/496078
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Eli Bendersky <eliben@google.com>
Run-TryBot: Eli Bendersky‎ <eliben@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
@rsc
Copy link
Contributor

rsc commented May 24, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: slices: add sorting and comparison functions slices: add sorting and comparison functions May 24, 2023
@rsc rsc modified the milestones: Proposal, Backlog May 24, 2023
@eliben
Copy link
Member

eliben commented May 24, 2023

Implemented in https://go.dev/cl/496078

@eliben eliben closed this as completed May 24, 2023
@gopherbot
Copy link

Change https://go.dev/cl/498175 mentions this issue: doc: add release notes for additions to the slices package

gopherbot pushed a commit that referenced this issue May 25, 2023
Updates #60091

Change-Id: I7438811f4e41a2977acbb5ac74c22a02c28c6597
Reviewed-on: https://go-review.googlesource.com/c/go/+/498175
Reviewed-by: Eli Bendersky <eliben@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Eli Bendersky‎ <eliben@golang.org>
Run-TryBot: Eli Bendersky‎ <eliben@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
@dmitshur dmitshur modified the milestones: Backlog, Go1.21 May 30, 2023
@gopherbot
Copy link

Change https://go.dev/cl/505796 mentions this issue: slices: clarify MinFunc/MaxFunc result for equal elements

github-actions bot pushed a commit to golangFame/go that referenced this issue Jun 24, 2023
They should return the first of equal elements. No such clarification
is required for Min/Max as for them equal elements are indistinguishable.

For golang#60091

Change-Id: Iad58115d482add852c811e993131702b5b3bec5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/505796
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Eli Bendersky <eliben@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
bradfitz pushed a commit to tailscale/go that referenced this issue Jul 15, 2023
They should return the first of equal elements. No such clarification
is required for Min/Max as for them equal elements are indistinguishable.

For golang#60091

Change-Id: Iad58115d482add852c811e993131702b5b3bec5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/505796
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Eli Bendersky <eliben@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
eric pushed a commit to fancybits/go that referenced this issue Sep 7, 2023
Now that the `cmp` package exists, sorting and comparison functions from
`x/exp/slices` can be ported to the standard library, using the
`cmp.Ordered` type and the `cmp.Less` and `cmp.Compare` functions.

This move also includes adjustments to the discussions in golang#60091 w.r.t.
NaN handling and cmp vs. less functions, and adds Min/Max functions.
The final API is taken from
golang#60091 (comment)

Updates golang#60091

Change-Id: Id7e6c88035b60d4ddd0c48dd82add8e8bc4e22d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/496078
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Eli Bendersky <eliben@google.com>
Run-TryBot: Eli Bendersky‎ <eliben@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
eric pushed a commit to fancybits/go that referenced this issue Sep 7, 2023
They should return the first of equal elements. No such clarification
is required for Min/Max as for them equal elements are indistinguishable.

For golang#60091

Change-Id: Iad58115d482add852c811e993131702b5b3bec5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/505796
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Eli Bendersky <eliben@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
eric pushed a commit to fancybits/go that referenced this issue Sep 7, 2023
Now that the `cmp` package exists, sorting and comparison functions from
`x/exp/slices` can be ported to the standard library, using the
`cmp.Ordered` type and the `cmp.Less` and `cmp.Compare` functions.

This move also includes adjustments to the discussions in golang#60091 w.r.t.
NaN handling and cmp vs. less functions, and adds Min/Max functions.
The final API is taken from
golang#60091 (comment)

Updates golang#60091

Change-Id: Id7e6c88035b60d4ddd0c48dd82add8e8bc4e22d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/496078
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Eli Bendersky <eliben@google.com>
Run-TryBot: Eli Bendersky‎ <eliben@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
eric pushed a commit to fancybits/go that referenced this issue Sep 7, 2023
They should return the first of equal elements. No such clarification
is required for Min/Max as for them equal elements are indistinguishable.

For golang#60091

Change-Id: Iad58115d482add852c811e993131702b5b3bec5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/505796
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Eli Bendersky <eliben@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests