Issue 2997041: code review 2997041: sort: binary search for sorted slices

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+186 lines, -0 lines)			Patch
M	src/pkg/sort/Makefile	View		1 chunk	+1 line, -0 lines	0 comments	Download
A	src/pkg/sort/search.go	View	1 2 3 4 5 6 7 8 9 10	1 chunk	+102 lines, -0 lines	2 comments	Download
A	src/pkg/sort/search_test.go	View	1 2 3 4 5 6	1 chunk	+83 lines, -0 lines	0 comments	Download

Messages

Total messages: 16

Expand All Messages | Collapse All Messages

gri

Hello r (cc: golang-dev@googlegroups.com), I'd like you to review this change.

13 years, 6 months ago (2010-11-09 01:04:24 UTC) #1

http://codereview.appspot.com/2997041/diff/3001/src/pkg/sort/search.go File src/pkg/sort/search.go (right): http://codereview.appspot.com/2997041/diff/3001/src/pkg/sort/search.go#newcode24 src/pkg/sort/search.go:24: // (Implementation according to: "A Method of Programming", E.W. ...

13 years, 6 months ago (2010-11-09 01:35:30 UTC) #2

gri

Hello r (cc: golang-dev@googlegroups.com), I'd like you to review this change.

13 years, 6 months ago (2010-11-10 04:45:41 UTC) #3

http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go File src/pkg/sort/search.go (right): http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go#newcode11 src/pkg/sort/search.go:11: // relation '<=' and the value x are represented ...

13 years, 6 months ago (2010-11-10 05:37:01 UTC) #4

gri

On Tue, Nov 9, 2010 at 9:37 PM, <r@golang.org> wrote: > > http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go > > ...

13 years, 6 months ago (2010-11-10 06:19:37 UTC) #5

On Nov 9, 2010, at 10:19 PM, Robert Griesemer wrote: > On Tue, Nov 9, ...

13 years, 6 months ago (2010-11-10 07:33:58 UTC) #6

never mind, i see - i think. i need to think about it some more. ...

13 years, 6 months ago (2010-11-10 07:36:06 UTC) #7

gri

Hello r, r2 (cc: golang-dev@googlegroups.com), I'd like you to review this change.

13 years, 6 months ago (2010-11-10 18:09:05 UTC) #8

http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go File src/pkg/sort/search.go (right): http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go#newcode10 src/pkg/sort/search.go:10: // and sorted data structure of n elements. The ...

13 years, 6 months ago (2010-11-10 19:54:05 UTC) #9

gri

Hello r, r2 (cc: golang-dev@googlegroups.com), Please take another look.

13 years, 6 months ago (2010-11-10 20:47:01 UTC) #10

http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go File src/pkg/sort/search.go (right): http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newcode19 src/pkg/sort/search.go:19: // order or "greater or equal" if the data ...

13 years, 6 months ago (2010-11-10 20:53:47 UTC) #11

LGTM http://codereview.appspot.com/2997041/diff/45001/src/pkg/sort/search.go File src/pkg/sort/search.go (right): http://codereview.appspot.com/2997041/diff/45001/src/pkg/sort/search.go#newcode20 src/pkg/sort/search.go:20: // is calling the function f with index ...

13 years, 6 months ago (2010-11-10 21:13:17 UTC) #12

gri

*** Submitted as http://code.google.com/p/go/source/detail?r=232237c810ad *** sort: binary search for sorted slices R=r, r2 CC=golang-dev http://codereview.appspot.com/2997041 ...

13 years, 6 months ago (2010-11-10 21:19:34 UTC) #13

rog

i sincerely apologise for bikeshedding on this - i should have paid more attention earlier. ...

13 years, 6 months ago (2010-11-11 16:57:29 UTC) #14

i sincerely apologise for bikeshedding on this - i should have paid more
attention earlier.

i think the explanation and usage could be considerably cleaner
if the specification were to change slightly.

currently the spec says:

// If data[0] <= x and x <= data[n-1], Search returns the index i with:
//
//	data[i] <= x && x <= data[i+1]
//
// where data[n] is assumed to be larger than any x.

this cannot be true, because f cannot test for x <= data[i].
moreover it's ambiguous if there's more than one occurrence
of x in data. i think you probably meant to say:

	data[i] <= x && x < data[i+1]

but if you change the condition slightly, then you don't need
any of the additional caveats.

how about this, as a proposed change:

// Search returns the index i with:
//
//	data[i-1] < x && x <= data[i]
//
// where data[-1] is assumed to be smaller than any x and data[n] is
// assumed to be larger than any x.  It is the responsibility of the
// caller to verify the actual presence by testing if data[i] == x.
// Thus 0 <= i <= n and i is the first index of x if x is present in the data.

the search example remains the same.

this means that Search always returns either the position
of the element, if it exists, or the position that it should be inserted
at.

the sense of the function would have to change (from <= to <) as we're
now searching for the first i: f(i)==false, rather than the
last i: f(i)==true. intuitively, we want to stop just
before the first element >= x, not at the last element <= x.

the modified algorithm might look like this, although i'm afraid i
haven't got Dijkstra and i have not proved it for correctness
or efficiency.

func Search(n int, f func(int) bool) int {
        i, j := 0, n
        for i+1 < j {
                h := i + (j-i)/2 // avoid overflow when computing h
                // i <= h <= j
                if f(h) {
                        // data[h] < x
                        i = h + 1
                } else {
                        // // x <= data[h]
                        j = h
                }
        }
        // test the final element that the loop did not.
        if i < j && f(i) {
                i++
        }
        return i
}

there's another advantage too, which is that if there are multiple
occurrences of x in the data set, Search returns the first occurrence
rather than the last, and it's more idiomatic to search forward
than backwards:

  for i := Search(n, f); i < n && data[i] == x; i++ {
  }

vs.

  i := Search(n, f)
  if i < n {
      for ; i >= 0 && data[i] == x; i-- {
      }
   }

it also means that the code for sorted insertion is considerably more
straightforward (see my post in golang-nuts)

again, sorry for the bikeshed, but i definitely think this is a change
worth making.

gri

On Thu, Nov 11, 2010 at 8:57 AM, roger peppe <rogpeppe@gmail.com> wrote: > i sincerely ...

13 years, 6 months ago (2010-11-11 19:34:02 UTC) #15

On Thu, Nov 11, 2010 at 8:57 AM, roger peppe <rogpeppe@gmail.com> wrote:

> i sincerely apologise for bikeshedding on this - i should have paid more
> attention earlier.
>
> i think the explanation and usage could be considerably cleaner
> if the specification were to change slightly.
>
> currently the spec says:
>
> // If data[0] <= x and x <= data[n-1], Search returns the index i with:
> //
> //      data[i] <= x && x <= data[i+1]
> //
> // where data[n] is assumed to be larger than any x.
>
> this cannot be true, because f cannot test for x <= data[i].
>

you're right - this is a typo that crept in - it's been fixed


> moreover it's ambiguous if there's more than one occurrence
> of x in data. i think you probably meant to say:
>
>        data[i] <= x && x < data[i+1]
>

yes

>
>

> but if you change the condition slightly, then you don't need
> any of the additional caveats.
>
> how about this, as a proposed change:
>
> // Search returns the index i with:
> //
> //      data[i-1] < x && x <= data[i]
> //
> // where data[-1] is assumed to be smaller than any x and data[n] is
> // assumed to be larger than any x.  It is the responsibility of the
> // caller to verify the actual presence by testing if data[i] == x.
> // Thus 0 <= i <= n and i is the first index of x if x is present in the
> data.
>

This sounds good to me; I actually was shooting for such a result at first,
but wasn't successful in making it work with what looks like the same amount
of calls to f (at least on paper).

>
> the search example remains the same.


Not quite. Since the result can be n one would have to check both against
len(data) == 0 (empty data) and i != n before testing data[i] == elem.
Admittedly a negligible cost.

>


> this means that Search always returns either the position
> of the element, if it exists, or the position that it should be inserted
> at.
>
> the sense of the function would have to change (from <= to <) as we're
> now searching for the first i: f(i)==false, rather than the
> last i: f(i)==true. intuitively, we want to stop just
> before the first element >= x, not at the last element <= x.
>
> the modified algorithm might look like this, although i'm afraid i
> haven't got Dijkstra and i have not proved it for correctness
>

a quick test indicates that this code is correct


> or efficiency.
>
> func Search(n int, f func(int) bool) int {
>        i, j := 0, n
>        for i+1 < j {
>                h := i + (j-i)/2 // avoid overflow when computing h
>                // i <= h <= j
>                if f(h) {
>                        // data[h] < x
>                        i = h + 1
>                } else {
>                        // // x <= data[h]
>                        j = h
>                }
>        }
>        // test the final element that the loop did not.
>        if i < j && f(i) {
>                i++
>        }
>        return i
> }
>

I am happy to change the current implementation if we can show that this is
not significantly worse than what we have now. For instance, superficially
at least it seems that the loop is slightly asymmetric: the lower bound i
moves up more quickly because of the +1 than the upper bound j. But I am not
sure that is really true since /2 "rounds down". Also, there appears to be
an extra call to f at the end (which can be costly), but perhaps there is
fewer iterations because of the +1.



>
> there's another advantage too, which is that if there are multiple
> occurrences of x in the data set, Search returns the first occurrence
> rather than the last, and it's more idiomatic to search forward
> than backwards:
>

agreed

>
>  for i := Search(n, f); i < n && data[i] == x; i++ {
>  }
>
> vs.
>
>  i := Search(n, f)
>  if i < n {
>      for ; i >= 0 && data[i] == x; i-- {
>      }
>   }
>
> it also means that the code for sorted insertion is considerably more
> straightforward (see my post in golang-nuts)
>
> again, sorry for the bikeshed, but i definitely think this is a change
> worth making.
>

I think it's not a clear cut if there's indeed an extra call to f. f might
be quite expensive, and Search might be used with an expensive f for
smallish data sets. Adding one call to f in this case would make it
significantly slower.

- gri

rog

13 years, 6 months ago (2010-11-11 19:57:59 UTC) #16

On 11 November 2010 19:33, Robert Griesemer <gri@golang.org> wrote:
> This sounds good to me; I actually was shooting for such a result at first,
> but wasn't successful in making it work with what looks like the same amount
> of calls to f (at least on paper).

yes, i'm not sure it is possible. the current algorithm is imprecise
by one bit when the element is not found - i can't prove
there's no better one though.

>> the search example remains the same.
>
> Not quite. Since the result can be n one would have to check both against
> len(data) == 0 (empty data) and i != n before testing data[i] == elem.
> Admittedly a negligible cost.

oops, you're right, the example is slightly different.
but checking i < n before testing data[i] == elem is sufficient, which
is the same cost,
as the example, although admittedly there's a slight cost if the len(data) check
had been done previously.

> I think it's not a clear cut if there's indeed an extra call to f. f might
> be quite expensive, and Search might be used with an expensive f for
> smallish data sets. Adding one call to f in this case would make it
> significantly slower.

yes. although personally i'd prefer the simpler semantics as they
make it easier to understand code that's using Search.
when performance is crucial it would usually be straightforward
to replace Search by another function with the original semantics,
as the existence test:

   if i < n && data[i] == x {...}

is sufficient for either algorithm.

Expand All Messages | Collapse All Messages

Issue 2997041: code review 2997041: sort: binary search for sorted slices (Closed)

Description

Patch Set 1 #

Patch Set 2 : code review 2997041: sort: binary search for sorted slices #

Patch Set 3 : code review 2997041: sort: binary search for sorted slices #

Patch Set 4 : code review 2997041: sort: binary search for sorted slices #

Patch Set 5 : code review 2997041: sort: binary search for sorted slices #

Patch Set 6 : code review 2997041: sort: binary search for sorted slices #

Patch Set 7 : code review 2997041: sort: binary search for sorted slices #

Patch Set 8 : code review 2997041: sort: binary search for sorted slices #

Patch Set 9 : code review 2997041: sort: binary search for sorted slices #

Patch Set 10 : code review 2997041: sort: binary search for sorted slices #

Patch Set 11 : code review 2997041: sort: binary search for sorted slices #

Messages