sort: add Find #50340

robpike · 2021-12-24T23:35:46Z

I have never used sort.Search to build a sorted list (slice). It's too expensive and there are better ways. However, I use it often to do what its name suggests: search.

The API is designed for the growth case, which is fine and general, but the general (growth) case is rare, perhaps very rare, and handling the return value is clumsy†.

I suggest adding another function, say sort.Contains, that just says yes or no to the question of presence of the element. Here is sort.Search's use for the pure search case, right from the documentation:

i := sort.Search(len(data), func(i int) bool { return data[i] >= x })
        if i < len(data) && data[i] == x {
        	// x is present at data[i]
        } else {
        	// x is not present in data,
        	// but i is the index where it would be inserted.
        }

Besides the messy handling, that extra comparison has always bothered me.

Here is what that would look like with the proposed function:

 if sort.Contains(len(data), func(i int) bool { return data[i] >= x }) {
      	// x is present at data[i]
}

This seems so much more intuitive and semantically lightweight. If you needed the index - but why would you if you weren't going to insert there? - you could go back to the existing Search function.

It should be very simple to implement by having both Search and Contains call a helper that reports presence, and returning the appropriate result, or by just writing Contains itself - like Search, it's very short.

I already filled in the proposal questionnaire here: #45624

† And poorly documented, but that is fixable.

The text was updated successfully, but these errors were encountered:

robpike · 2021-12-24T23:36:45Z

By the way, even in the event type parameters cause a rethink of this package, a lighter-weight Contains function would be nice to have.

DeedleFake · 2021-12-26T14:57:14Z

How would this work? I was under the impression that the reason for the secondary data[i] == x check is because the f function doesn't necessarily provide enough information to be sure if the value at i is actually the one being searched for or not. Since sort.Contains() doesn't actually return i, how would you be sure if the value was correct or not?

I'm also curious what the better way to build a sorted slice is that you mentioned. I have used sort.Search() to do so, many times. The only way I can think of that's even equivalent would be to build a heap and then pop the whole thing once you're done adding stuff to it, but just off the top of my head both seem to be O(n * log(n)) to me.

Edit: Actually, I think that the heap method would be better as it doesn't require as many copies, and potentially allocations, when adding elements thanks to not having to insert into the middle of a slice. It does use interfaces if using container/heap, though, which is a whole different type of overhead.

robpike · 2021-12-26T22:32:27Z

Whatever else it might do, it can do another test. It only has <= but two calls of that will tell it == (NaNs aside, but we're already on thin ice with NaNs. They're not even numbers.)

In case it's not clear from what I wrote above, that final test (or two) may be necessary, but I want the function to do it, not the caller.

DeedleFake · 2021-12-27T07:16:20Z

As proposed, there's no way for the function to determine actual equality. The function only takes the index to check against and, therefore, knows nothing about the value actually being checked. For example, given f is func(i int) { return data[i] >= 3 }, consider the the slices []int{1, 2, 3, 4} and []int{1, 2, 4, 5}. f(i) returns the exact same values for all indices in both cases, but the value is actually only present in one of the slices. Unless I'm somehow just completely missing something here, the API proposed does not seem capable of determining for sure whether or not the value being searched for is actually in the slice.

It only has <= but two calls of that will tell it ==

It doesn't have <=, though. It only has x <=. The x is built in to the function passed in. Without the ability change the order of the two things being compared, it can't be used to determine equality.

The only mention that the proposal seems to make for separately determining equality is

It should be very simple to implement by having both Search and Contains call a helper that reports presence, and returning the appropriate result, or by just writing Contains itself - like Search, it's very short.

but it's not clear what exactly that would look like. It sounds like you're saying that sort.Contains() needs a second function passed to it, maybe func(i int) bool, that it will call to determine the final equality. If so, that would work, but I don't see a whole lot of advantage over the existing sort.Search(), and having to pass two closures to sort.Contains() seems more cumbersome to me, not less.

This

present := sort.Contains(
  len(data),
  func(i int) bool { return data[i] >= x },
  func(i int) bool { return data[i] == x },
)
if present {
  // x is present in data
}

doesn't seem much better to me than this

i := sort.Search(len(data), func(i int) bool { return data[i] >= x })
if (i < len(data)) && (data[i] == x) {
  // x is present at data[i]
}

Edit: It occurs to me that this is fairly straightforwards with generics:

Playground Link

func Contains[T any, S ~[]T](s S, x T, less func(v1, v2 T) bool) bool {
	i := sort.Search(len(s), func(i int) bool { return less(x, s[i]) }) - 1
	return (i >= 0) && !less(s[i], x)
}

This is limited to slices, but that probably covers 99% of cases. A more complicated one that covers other cases is also possible, but probably not worth the trouble.

Edit 2: Fix an instance of return condition ? true : false. Sigh... That's what I get for writing code at 2:00 A.M.

robpike · 2021-12-27T09:43:11Z

I had forgotten the challenge of moving the equality check away from the caller. I hadn't thought it through - even though I remember talking to @griesemer about the challenge when he was designing it. Mea ignoramica.

But that affects only the mechanism, not my desire, which is to simplify the API. I like your solution, at least as a discussion point. I can't help feel there's a clean API trying to be found. C's bsearch isn't all that nice to use either, though.

robpike · 2021-12-27T09:48:16Z

There's even this nice (untested) possibility for the simplest and possibly commonest case, if we can choose a different name for it.

func Lookup[T constraints.Comparable, S ~[]T](s S, x T) bool {
	i := sort.Search(len(s), func(i int) bool { return x <= s[i] })
	return i < len(s) && x==s[i]
}

mlevieux · 2022-01-03T15:56:37Z

I'm probably about to say something completely wrong, but isn't the problem solved by asking for a function able to report for multiple states? Like here we're focusing on a closure like func (i int) bool, but we could totally go for func (i int) int that'd be a comparator function so -1 on <, 0 on == and 1 on >. So sort.Contains would take this function and return true if it could compare some value for equality, false otherwise. This makes writing the closure longer and probably a little-less readable. But it also has the advantage of letting the caller define the equality relations according the use case, which is not possible through the use of constraints.Comparable.

rsc · 2022-01-12T19:01:39Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2022-01-19T18:13:50Z

It sounds like we agree that sort.Search is hard to use, but there's not an obvious replacement. In particular, anything that can report an exact position has to take a different function (a 3-way result like strings.Compare, or multiple functions).

For sake of argument, we could do:

func Find(n int, cmp func (i int) int) int

where the cmp function returns <0, 0, or >0 to say whether the item being sought is less than, equal to, or greater than element at position i.

(In comparison, sort.Search's f reports whether the item being sought is less than the element at position i; that is, it's Less(item, elem[i]), whereas Find takes Cmp(item, elem[i]).)

Is that something we should consider adding?

rsc · 2022-01-19T18:15:28Z

When we do resolve this discussion, perhaps we should revisit #47619 (comment).

robpike · 2022-01-20T00:52:48Z

@rsc I am in favor of adding a function with the signature you suggest, and happy with the name.

rsc · 2022-01-26T18:24:32Z

Sounds like people are generally in favor of adding sort.Find.
It looks like we'd need to add slices.BinaryFind and slices.BinaryFindFunc as well.
The difference between Search and Find in all cases would be that Find returns -1 when the element is not present,
whereas Search returns the index in the slice where it would be inserted.

Do I have that right? Does anyone object to adding Find (and slices.BinaryFind and slices.BinaryFindFunc)?

rogpeppe · 2022-01-27T18:33:50Z

I am at least partially responsible for the design of the original API and I also take Rob's view that it's needlessly awkward in the common case (it's also annoying because it's backward from most of the rest of the sort package's API - you have to think in terms of greater-than rather than less-than, but that's another matter).

where the cmp function returns <0, 0, or >0 to say whether the item being sought is less than, equal to, or greater than element at position i.

I think I'd find it be more intuitive to say that cmp returned whether the element is <0, 0 or >0 than the item being sought ("how does element i compare to what I'm looking for?"), but that would probably be an unwarrantable difference in approach from sort.Search.

What would the signature of BinaryFind look like? I'm thinking something like:

func BinaryFind[T any](x []T, item T, func(T, T) int) int

for example:

BinaryFind([]string{"a", "b", "c"}, "b", strings.Compare)

Maybe a good reason to add a time.Time.Compare method:

BinaryFind([]time.Time{...}, t, time.Time.Compare)

ianlancetaylor · 2022-01-27T21:05:51Z

I think the analogous slices signatures would be

func BinaryFind[E comparable](s []E, E) int
func BinaryFindFunc[E any](s []E, func(E) int) int

It's a reasonable question to ask whether BinaryFindFunc should instead be

func BinaryFindFunc[E any](s []E, v E, func(E, E) int) int

I'm not sure. I think the main argument for the latter would be that we can pass functions like bytes.Compare, as you suggest.

rogpeppe · 2022-01-27T23:10:42Z

I think the analogous slices signatures would be
func BinaryFind[E comparable](s []E, E) int
func BinaryFindFunc[E any](s []E, func(E) int) int
It's a reasonable question to ask whether BinaryFindFunc should instead be
func BinaryFindFunc[E any](s []E, v E, func(E, E) int) int
I'm not sure. I think the main argument for the latter would be that we can pass functions like bytes.Compare, as you suggest.

The main reason from my point of view is that the item that's being searched for is rarely a constant, so the former always in practice requires the caller to create a closure that refers to that value, which is unwieldy and not really that much easier to use than sort.Find would be, I think.

Example calls:

	i = sort.Find(len(xs), func(i int) int {
		return strings.Compare(x, xs[i])
	})
	i = slices.FindFunc1(xs, func(y string) int {
		return strings.Compare(x, y)
	})
	i = slices.FindFunc2(xs, x, strings.Compare)

The latter is the clear winner, I think - it's shorter and easier to get right. With the closure versions,
you have to remember the order of the comparison - strings.Compare(y, x) would
be just as possible, but a function of the form func(T, T) int has a natural ordering.

I suspect it might end up more efficient too, because FindFunc is unlikely to be inlined, so
the closure wouldn't be inlined, so there would be an extra indirect function call on each comparison AFAICS.

rsc · 2022-02-02T18:13:06Z

The three-argument FindFunc (FindFunc2 above) would be different from the other Funcs we have, like IndexFunc. It doesn't make sense to me to diverge from those.

Should we leave BinaryFindFunc out until we are more sure it's needed?

DeedleFake · 2022-02-02T19:26:26Z

I feel like the FindFunc2() signature is just a workaround for the unwieldy nature of simple closures. Maybe something like that should be reconsidered in the context of #21498? A shorter syntax would make that entirely unnecessary, in my opinion.

rogpeppe · 2022-02-03T10:43:07Z

The three-argument FindFunc (FindFunc2 above) would be different from the other Funcs we have, like IndexFunc. It doesn't make sense to me to diverge from those.

I'd suggest that FindFunc is different from IndexFunc in that IndexFunc is only rarely used to find a locally defined value.

A quick scan of a random corpus of a few million lines of code turned up 20 uses of IndexFunc, of which only one used a closure to define the value being searched for. All the rest were similar to strings.IndexFunc(s, unicode.IsSpace) - that is, the predicate function was independently defined only in terms of its argument.

One of the nice things about IndexFunc is that it can be used with many existing "out of the box" functions (unicode.IsSpace being just one example). With FindFunc1 above, that's not the case - you will always need to define your own closure, and there's always to possibility of getting the argument ordering wrong and the potential subtle bugs that could arise as a result:

i = slices.FindFunc1(xs, func(y string) int {
	return strings.Compare(y, x)   // WRONG!
})

On the other hand, with a two-argument form, many existing functions and methods can be used as they are:

slices.FindFunc2(byteSlices, byteSlice, bytes.Compare)
slices.FindFunc2(ipAddrs, ipAddr, netip.Addr.Compare)
slices.FindFunc2(semvers, version, semver.Compare)
slices.FindFunc2(bigInts, i, (*big.Int).Cmp)

I'd argue that passing a compare function is more natural than passing a single argument function, has less potential for subtle bugs, and makes better use of existing available functions and methods.

I don't believe this is about closures being unwieldy. Even if closures were more syntactically lightweight, you'd still need to remember to pass the arguments to the underlying comparison function (and there will usually be an underlying comparison function, I suspect) in the right order.

rogpeppe · 2022-02-03T12:27:12Z

One other thing that's worth clarifying: currently sort.Search is guaranteed to find the smallest value such that f(i) is true, which means if there are multiple exact matches, it will always return the earliest one.

With sort.Find, we could potentially return immediately when the compare function returns 0. That's more efficient but a bit inconsistent with sort.Search. Personally I don't think that matters (if someone really needs the very first instance in a run, they can still use sort.Search), but opinions might well vary. WDYT?

ianlancetaylor · 2022-02-03T20:05:07Z

I think sort.Find should be defined as returning any arbitrary index for which the comparison function returns zero.

rsc · 2022-02-09T18:33:25Z

Okay, so it sounds like we all agree on sort.Find.

slices.Anything is a bit harder. It sounds like maybe the signature should be different from sort.Find.
Also BinaryFind is not a real thing like binary search is.

Perhaps slices.BinarySearchFunc should change to be the FindFunc2 we've been talking about.
Specifically, right now we have:

slices.BinarySearch(slice []T, target T)
slices.BinarySearchFunc(slice []T, less func(value T) bool)

and maybe we should change it to:

slices.BinarySearch(slice []T, target T)
slices.BinarySearchFunc(slice []T, target T, cmp func(T, T) int)

That would line up well with what other languages do and it avoids the problems @rogpeppe points out.
And we'd be back down to having just one.

Thoughts?

ianlancetaylor · 2022-02-09T18:33:34Z

CC @eliben

eliben · 2022-02-10T00:14:21Z

I agree with decoupling the interfaces the sort and slices packages. The former works on abstract indices, whereas the latter works on actual slices.

The changes to slices sound alright to me; to be clear, there's no change to BinarySearch, but there is change to the semantics of BinarySearchFunc, from the current semantics of taking ok func(Elem) bool where it finds and returns the smallest index i at which element ok returns true for element i. The new semantics will be taking cmp func(T, T) int and returning the smallest index i at which cmp returns 0?

I don't have a preference w.r.t. sort package, other than to express that having both Search and Find functions with subtly different semantics in it somewhat confusing.

rogpeppe · 2022-02-10T08:19:27Z

The new semantics will be taking cmp func(T, T) int and returning the smallest index i at which cmp returns 0?

I believe as per this comment the new semantics will be returning the first (arbitrary) index i encountered for which cmp returns 0.

eliben · 2022-03-24T12:38:26Z

I'll take a stab at implementing the slices functions first.

gopherbot · 2022-03-24T13:42:41Z

Change https://go.dev/cl/395414 mentions this issue: slices: rework the APIs of BinarySearch*

For golang/go#50340 Change-Id: If115b2b66d463d5f3788d017924f8dd38867551c Reviewed-on: https://go-review.googlesource.com/c/exp/+/395414 Reviewed-by: Ian Lance Taylor <iant@golang.org> Trust: Eli Bendersky‎ <eliben@golang.org>

eliben · 2022-03-28T17:58:51Z

Now that the new functions have landed in exp/slices, I can work on sort.Find as described in #50340 (comment)

Unless @robpike you have a prototype already that you want to send...

gopherbot · 2022-03-29T19:54:45Z

Change https://go.dev/cl/396514 mentions this issue: sort: add Find function

Merovius · 2022-04-05T05:29:45Z

This came up on golang-nuts, where someone asked the predicate-based version to return. I also think it should.

It is easy, when you have a comparison function, to get to a predicate:

var b T
pred := func(a T) bool { return cmp(a, b) >= 0 }

But not the other way around, so a predicate is more general.

I used the predicate version of sort.Search to look for prefix-matches in a sorted slice of paths. i.e. I was not looking for an equality match of the needle, but also something of which the needle is a prefix.

When I saw this issue fly by, I assumed this would be about adding a simpler version of sort.Search to make the common use-case simpler, which seemed a good idea, so I didn't follow this discussion. I would've spoken up sooner, if I expected the more general version to disappear altogether.

ianlancetaylor · 2022-04-06T00:24:15Z

I agree that a predicate is more general, but the slice still has to be sorted according to the predicate. And the comparison function seems easier to reason about. And given a predicate you can get the comparison by calling the predicate twice. And slices.BinarySearchFunc will return the index you want even if the element is not present.

So, maybe?

For #50340 Change-Id: I3b4d278affc8e7ec706db8c9777f7a8c8ce7441d Reviewed-on: https://go-review.googlesource.com/c/go/+/396514 Reviewed-by: Ian Lance Taylor <iant@golang.org> Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Russ Cox <rsc@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>

DeedleFake · 2022-04-19T20:13:44Z

@rogpeppe

Maybe a good reason to add a time.Time.Compare method:

As long as the comparison function isn't required to only return -1, 0, or 1, time.Time.Sub is already a comparison function, except that it returns time.Duration, not int.

rogpeppe · 2022-04-21T13:39:17Z

@DeedleFake Yes, true, although we do have a fairly strong convention for the way that comparison functions work already.
Still, we could potentially allow this with:

func BinarySearchFunc[T any, C constraints.Ordered](slice []T, target T, cmp func(T, T) C) (int, bool)

which would allow BinarySearchFunc([]time.Time{...}, t, time.Time.Sub)

Maybe worth considering?

DeedleFake · 2022-04-21T14:02:50Z

I was actually going to mention that in my comment, but decided against it because it seemed too silly, and it also doesn't work with the index-based search, anyways.

rogpeppe · 2022-04-21T14:12:21Z

@DeedleFake

it also doesn't work with the index-based search, anyways.

How so?

DeedleFake · 2022-04-21T14:16:19Z

Because the comparison function in BinarySearch(length int, cmp func(int) int) wants an index into the data structure, so time.Time.Sub can't be used directly regardless. I completely forgot about the BinarySearchFunc variant.

Merovius · 2022-04-21T14:16:35Z

If anything, we should make the return type of the comparison stronger, not weaker. That is, something like

type Ordering int

const (
    Lt Ordering = -1
    Eq Ordering = 0
    Gt Ordering = 1 
)

constraints.Ordered seems very wrong, in any case. It includes string types, which… okay. But also float64, which is very bad with all them NaNs.

DeedleFake · 2022-04-21T14:28:39Z

Oh, I didn't notice which constraint that was. Yeah, don't use constraints.Ordered. It also includes string and unsigned ints, neither which makes any sense. Use constraints.Signed.

odeke-em · 2022-05-04T19:34:32Z

@eliben thank you for CL https://go-review.googlesource.com/c/go/+/396514 which I believe implemented this feature. @eliben @rsc @ianlancetaylor shall we close this issue then? Just waiting for your confirmation before I close it. Thank you everyone for the discourse and hard work.

rsc · 2022-08-05T16:01:30Z

Shipped in Go 1.19.

robpike added the Proposal label Dec 24, 2021

gopherbot added this to the Proposal milestone Dec 24, 2021

ianlancetaylor added this to Incoming in Proposals (old) Dec 24, 2021

rsc moved this from Incoming to Active in Proposals (old) Jan 12, 2022

rsc changed the title ~~proposal: sort: add a boolean-valued search function~~ proposal: sort: add Find Jan 26, 2022

rogpeppe mentioned this issue Jan 27, 2022

time: add Time.Compare #50770

Closed

rsc added this to the Backlog milestone Mar 23, 2022

ianlancetaylor mentioned this issue Mar 28, 2022

slices: new package to provide generic slice functions #45955

Closed

tmaxmax mentioned this issue Mar 29, 2022

proposal: x/exp/slices: Find function to select first element matching a condition #52006

Closed

cherrymui mentioned this issue Jun 9, 2022

api: audit for Go 1.19 #53310

Closed

rsc closed this as completed Aug 5, 2022

earthboundkid mentioned this issue Nov 2, 2022

math: add Compare and Compare32 #56491

Closed

Merovius mentioned this issue Dec 16, 2022

x/exp/slices: Allow different types for haystack/needle in BinarySearchFunc #57348

Closed

ianlancetaylor mentioned this issue Feb 16, 2023

slices: new standard library package based on x/exp/slices #57433

Closed

twmb mentioned this issue Mar 27, 2023

proposal: exp/slices: change slices.BinarySearchFunc E, T ordering in the closure #59260

Closed

Merovius mentioned this issue May 11, 2023

slices: add sorting and comparison functions #60091

Closed

dmitshur modified the milestones: Backlog, Go1.19 May 30, 2023

cuishuang mentioned this issue Feb 29, 2024

proposal: slices: add a reusable slices.Contains() #66022

Open

sort: add Find #50340

sort: add Find #50340

Comments

robpike commented Dec 24, 2021 • edited

robpike commented Dec 24, 2021

DeedleFake commented Dec 26, 2021 • edited

robpike commented Dec 26, 2021

DeedleFake commented Dec 27, 2021 • edited

robpike commented Dec 27, 2021 • edited

robpike commented Dec 27, 2021

mlevieux commented Jan 3, 2022

rsc commented Jan 12, 2022

rsc commented Jan 19, 2022

rsc commented Jan 19, 2022

robpike commented Jan 20, 2022

rsc commented Jan 26, 2022

rogpeppe commented Jan 27, 2022

ianlancetaylor commented Jan 27, 2022

rogpeppe commented Jan 27, 2022 • edited

rsc commented Feb 2, 2022

DeedleFake commented Feb 2, 2022 • edited

rogpeppe commented Feb 3, 2022 • edited

rogpeppe commented Feb 3, 2022

ianlancetaylor commented Feb 3, 2022

rsc commented Feb 9, 2022

ianlancetaylor commented Feb 9, 2022

eliben commented Feb 10, 2022

rogpeppe commented Feb 10, 2022

eliben commented Mar 24, 2022

gopherbot commented Mar 24, 2022

eliben commented Mar 28, 2022

gopherbot commented Mar 29, 2022

Merovius commented Apr 5, 2022 • edited

ianlancetaylor commented Apr 6, 2022

DeedleFake commented Apr 19, 2022

rogpeppe commented Apr 21, 2022

DeedleFake commented Apr 21, 2022

rogpeppe commented Apr 21, 2022

DeedleFake commented Apr 21, 2022

Merovius commented Apr 21, 2022

DeedleFake commented Apr 21, 2022 • edited

odeke-em commented May 4, 2022

rsc commented Aug 5, 2022

robpike commented Dec 24, 2021 •

edited

DeedleFake commented Dec 26, 2021 •

edited

DeedleFake commented Dec 27, 2021 •

edited

robpike commented Dec 27, 2021 •

edited

rogpeppe commented Jan 27, 2022 •

edited

DeedleFake commented Feb 2, 2022 •

edited

rogpeppe commented Feb 3, 2022 •

edited

Merovius commented Apr 5, 2022 •

edited

DeedleFake commented Apr 21, 2022 •

edited