|
|
Descriptionsort: binary search for sorted slices
Patch Set 1 #Patch Set 2 : code review 2997041: sort: binary search for sorted slices #
Total comments: 3
Patch Set 3 : code review 2997041: sort: binary search for sorted slices #Patch Set 4 : code review 2997041: sort: binary search for sorted slices #
Total comments: 3
Patch Set 5 : code review 2997041: sort: binary search for sorted slices #Patch Set 6 : code review 2997041: sort: binary search for sorted slices #Patch Set 7 : code review 2997041: sort: binary search for sorted slices #Patch Set 8 : code review 2997041: sort: binary search for sorted slices #
Total comments: 9
Patch Set 9 : code review 2997041: sort: binary search for sorted slices #
Total comments: 11
Patch Set 10 : code review 2997041: sort: binary search for sorted slices #Patch Set 11 : code review 2997041: sort: binary search for sorted slices #
Total comments: 2
MessagesTotal messages: 16
Hello r (cc: golang-dev@googlegroups.com), I'd like you to review this change.
Sign in to reply to this message.
http://codereview.appspot.com/2997041/diff/3001/src/pkg/sort/search.go File src/pkg/sort/search.go (right): http://codereview.appspot.com/2997041/diff/3001/src/pkg/sort/search.go#newcode24 src/pkg/sort/search.go:24: // (Implementation according to: "A Method of Programming", E.W. Dijkstra.) this is ok, but i'm not sure it's necessary to cite http://codereview.appspot.com/2997041/diff/3001/src/pkg/sort/search.go#newcode29 src/pkg/sort/search.go:29: h := i + (j-i)/2 // i < h < j might be worth explaining the worry about overflow http://codereview.appspot.com/2997041/diff/3001/src/pkg/sort/search_test.go File src/pkg/sort/search_test.go (right): http://codereview.appspot.com/2997041/diff/3001/src/pkg/sort/search_test.go#n... src/pkg/sort/search_test.go:18: func TestSearchNotFound(t *testing.T) { these tests could all be folded into a table and checked with reflect.DeepEqual
Sign in to reply to this message.
Hello r (cc: golang-dev@googlegroups.com), I'd like you to review this change.
Sign in to reply to this message.
http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go File src/pkg/sort/search.go (right): http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:11: // relation '<=' and the value x are represented by the function f, which the sort order is defined by the function. you should perhaps rewrite to explain that. this search would work fine with a descending sort if you used a >= operator (and you should write a test for that) http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:16: // where data(i) indicates the i'th element of the data structure. perhaps here is the place to clarify that http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:68: // SearchFloats searches x in a sorted slice of strings and returns the index SearchStrings
Sign in to reply to this message.
On Tue, Nov 9, 2010 at 9:37 PM, <r@golang.org> wrote: > > http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go > > File src/pkg/sort/search.go (right): > > > http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go#newco... > src/pkg/sort/search.go:11: // relation '<=' and the value x are > represented by the function f, which > the sort order is defined by the function. you should perhaps rewrite > to explain that. this search would work fine with a descending sort if > you used a >= operator (and you should write a test for that) > actually it doesn't: search as written requires that the sort order is ascending. More precisely, if i < j, data(i) <= data(j) must be true, otherwise it won't work (that's how the search decides how to move the index, after all. One could provide an extra parameter to search (ascending, descending) and all the convenience functions. Alternatively, one could provide duplicate functions (search up and down). Any preferences? > > > http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go#newco... > src/pkg/sort/search.go:16: // where data(i) indicates the i'th element > of the data structure. > perhaps here is the place to clarify that > > > http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go#newco... > src/pkg/sort/search.go:68: // SearchFloats searches x in a sorted slice > of strings and returns the index > SearchStrings done. > > > http://codereview.appspot.com/2997041/ >
Sign in to reply to this message.
On Nov 9, 2010, at 10:19 PM, Robert Griesemer wrote: > On Tue, Nov 9, 2010 at 9:37 PM, <r@golang.org> wrote: > > http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go > > File src/pkg/sort/search.go (right): > > http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go#newco... > src/pkg/sort/search.go:11: // relation '<=' and the value x are > represented by the function f, which > the sort order is defined by the function. you should perhaps rewrite > to explain that. this search would work fine with a descending sort if > you used a >= operator (and you should write a test for that) > > actually it doesn't: search as written requires that the sort order is ascending. More precisely, if i < j, data(i) <= data(j) must be true, otherwise it won't work (that's how the search decides how to move the index, after all. is that true? isn't it that if i < j, f(i) <= f(j)? if that's not it, i don't understand what's going on. > One could provide an extra parameter to search (ascending, descending) and all the convenience functions. Alternatively, one could provide duplicate functions (search up and down). > Any preferences? > > http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go#newco... > src/pkg/sort/search.go:16: // where data(i) indicates the i'th element > of the data structure. > perhaps here is the place to clarify that > > http://codereview.appspot.com/2997041/diff/12001/src/pkg/sort/search.go#newco... > src/pkg/sort/search.go:68: // SearchFloats searches x in a sorted slice > of strings and returns the index > SearchStrings > done. > > > http://codereview.appspot.com/2997041/ >
Sign in to reply to this message.
never mind, i see - i think. i need to think about it some more. if the sort order can't be defined by f, though, the model isn't right. -rob
Sign in to reply to this message.
Hello r, r2 (cc: golang-dev@googlegroups.com), I'd like you to review this change.
Sign in to reply to this message.
http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go File src/pkg/sort/search.go (right): http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:10: // and sorted data structure of n elements. The sorting order and criteria as s/and criteria/ http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:11: // well as the value x are represented by the function f, which is typically s/value x/x, the value to be searched for,/ http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:11: // well as the value x are represented by the function f, which is typically s/represented/captured/ http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:12: // implemented as: you want to say something like The argument function f captures the value to be searched for, how the elements are indexed, and how they are sorted. It will often be passed as a closure. For instance, given a slice of integers, []data, sorted in increasing order, the function func(i int) bool { return data[i] <= 23 } can be used to search for the value 23 in data. The relationship expressed by the function must be "less or equal" if the data are sorted in increasing order or "greater or equal" if the data are sorted in decreasing order. http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:21: // If data(0) <= x and x <= data(n-1), that is, if x might be present in the maybe use [] sted () for indexing http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:34: // For instance, to find the element x in a slice s sorted in ascending order s/x/elem/ s/slice/integer slice/ http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:34: // For instance, to find the element x in a slice s sorted in ascending order s/For instance/ something like To complete the example above, ... http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:36: // change this example to use the f from above http://codereview.appspot.com/2997041/diff/30004/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:37: // i := sort.Search(len(s), func(i int) bool { return s[i] <= x }) s/x/elem/ here and below
Sign in to reply to this message.
Hello r, r2 (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go File src/pkg/sort/search.go (right): http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:19: // order or "greater or equal" if the data are sorted in descending order. -,.s/data/elements/ to echo the word above and avoid the plural confusion http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:21: // For brevity, in the following ascending sort order is assumed (for descending For brevity, this discussion assumes ascending sort order. For descending order, replace... (no parens, two sentences, don't need 'sort' in the second sentence) http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:25: // in the data, Search returns the index i with: drop the comma-delimited clause. it's clear from context and later explanation http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:27: // data[i] <= x && x <= data[i+1] // == f(i) && !f(i+1) what is f(i+1) if i==n-1? maybe just drop the comment http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:30: // if it were present in the data. It is the responsibility of the caller to s/were/is/ http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:38: // elem in an integer slice s sorted in ascending order: rename s to data so it's the same example, or use s in the earlier example.
Sign in to reply to this message.
LGTM http://codereview.appspot.com/2997041/diff/45001/src/pkg/sort/search.go File src/pkg/sort/search.go (right): http://codereview.appspot.com/2997041/diff/45001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:20: // is calling the function f with index values ranging from 0 to n-1. The function f will be called with values of i in the range 0 to n-1. http://codereview.appspot.com/2997041/diff/45001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:23: // order, replace <= with >=, and swap 'smaller' with 'larger. s/swap //
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=232237c810ad *** sort: binary search for sorted slices R=r, r2 CC=golang-dev http://codereview.appspot.com/2997041 Committer: Robert Griesemer <gri@golang.org> http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go File src/pkg/sort/search.go (right): http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:21: // For brevity, in the following ascending sort order is assumed (for descending On 2010/11/10 20:53:47, r wrote: > For brevity, this discussion assumes ascending sort order. For descending > order, replace... > > (no parens, two sentences, don't need 'sort' in the second sentence) Done. http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:25: // in the data, Search returns the index i with: On 2010/11/10 20:53:47, r wrote: > drop the comma-delimited clause. it's clear from context and later explanation Done. http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:27: // data[i] <= x && x <= data[i+1] // == f(i) && !f(i+1) On 2010/11/10 20:53:47, r wrote: > what is f(i+1) if i==n-1? maybe just drop the comment Done. http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:30: // if it were present in the data. It is the responsibility of the caller to On 2010/11/10 20:53:47, r wrote: > s/were/is/ Done. http://codereview.appspot.com/2997041/diff/37001/src/pkg/sort/search.go#newco... src/pkg/sort/search.go:38: // elem in an integer slice s sorted in ascending order: On 2010/11/10 20:53:47, r wrote: > rename s to data so it's the same example, or use s in the earlier example. Done.
Sign in to reply to this message.
i sincerely apologise for bikeshedding on this - i should have paid more attention earlier. i think the explanation and usage could be considerably cleaner if the specification were to change slightly. currently the spec says: // If data[0] <= x and x <= data[n-1], Search returns the index i with: // // data[i] <= x && x <= data[i+1] // // where data[n] is assumed to be larger than any x. this cannot be true, because f cannot test for x <= data[i]. moreover it's ambiguous if there's more than one occurrence of x in data. i think you probably meant to say: data[i] <= x && x < data[i+1] but if you change the condition slightly, then you don't need any of the additional caveats. how about this, as a proposed change: // Search returns the index i with: // // data[i-1] < x && x <= data[i] // // where data[-1] is assumed to be smaller than any x and data[n] is // assumed to be larger than any x. It is the responsibility of the // caller to verify the actual presence by testing if data[i] == x. // Thus 0 <= i <= n and i is the first index of x if x is present in the data. the search example remains the same. this means that Search always returns either the position of the element, if it exists, or the position that it should be inserted at. the sense of the function would have to change (from <= to <) as we're now searching for the first i: f(i)==false, rather than the last i: f(i)==true. intuitively, we want to stop just before the first element >= x, not at the last element <= x. the modified algorithm might look like this, although i'm afraid i haven't got Dijkstra and i have not proved it for correctness or efficiency. func Search(n int, f func(int) bool) int { i, j := 0, n for i+1 < j { h := i + (j-i)/2 // avoid overflow when computing h // i <= h <= j if f(h) { // data[h] < x i = h + 1 } else { // // x <= data[h] j = h } } // test the final element that the loop did not. if i < j && f(i) { i++ } return i } there's another advantage too, which is that if there are multiple occurrences of x in the data set, Search returns the first occurrence rather than the last, and it's more idiomatic to search forward than backwards: for i := Search(n, f); i < n && data[i] == x; i++ { } vs. i := Search(n, f) if i < n { for ; i >= 0 && data[i] == x; i-- { } } it also means that the code for sorted insertion is considerably more straightforward (see my post in golang-nuts) again, sorry for the bikeshed, but i definitely think this is a change worth making.
Sign in to reply to this message.
On Thu, Nov 11, 2010 at 8:57 AM, roger peppe <rogpeppe@gmail.com> wrote: > i sincerely apologise for bikeshedding on this - i should have paid more > attention earlier. > > i think the explanation and usage could be considerably cleaner > if the specification were to change slightly. > > currently the spec says: > > // If data[0] <= x and x <= data[n-1], Search returns the index i with: > // > // data[i] <= x && x <= data[i+1] > // > // where data[n] is assumed to be larger than any x. > > this cannot be true, because f cannot test for x <= data[i]. > you're right - this is a typo that crept in - it's been fixed > moreover it's ambiguous if there's more than one occurrence > of x in data. i think you probably meant to say: > > data[i] <= x && x < data[i+1] > yes > > > but if you change the condition slightly, then you don't need > any of the additional caveats. > > how about this, as a proposed change: > > // Search returns the index i with: > // > // data[i-1] < x && x <= data[i] > // > // where data[-1] is assumed to be smaller than any x and data[n] is > // assumed to be larger than any x. It is the responsibility of the > // caller to verify the actual presence by testing if data[i] == x. > // Thus 0 <= i <= n and i is the first index of x if x is present in the > data. > This sounds good to me; I actually was shooting for such a result at first, but wasn't successful in making it work with what looks like the same amount of calls to f (at least on paper). > > the search example remains the same. Not quite. Since the result can be n one would have to check both against len(data) == 0 (empty data) and i != n before testing data[i] == elem. Admittedly a negligible cost. > > this means that Search always returns either the position > of the element, if it exists, or the position that it should be inserted > at. > > the sense of the function would have to change (from <= to <) as we're > now searching for the first i: f(i)==false, rather than the > last i: f(i)==true. intuitively, we want to stop just > before the first element >= x, not at the last element <= x. > > the modified algorithm might look like this, although i'm afraid i > haven't got Dijkstra and i have not proved it for correctness > a quick test indicates that this code is correct > or efficiency. > > func Search(n int, f func(int) bool) int { > i, j := 0, n > for i+1 < j { > h := i + (j-i)/2 // avoid overflow when computing h > // i <= h <= j > if f(h) { > // data[h] < x > i = h + 1 > } else { > // // x <= data[h] > j = h > } > } > // test the final element that the loop did not. > if i < j && f(i) { > i++ > } > return i > } > I am happy to change the current implementation if we can show that this is not significantly worse than what we have now. For instance, superficially at least it seems that the loop is slightly asymmetric: the lower bound i moves up more quickly because of the +1 than the upper bound j. But I am not sure that is really true since /2 "rounds down". Also, there appears to be an extra call to f at the end (which can be costly), but perhaps there is fewer iterations because of the +1. > > there's another advantage too, which is that if there are multiple > occurrences of x in the data set, Search returns the first occurrence > rather than the last, and it's more idiomatic to search forward > than backwards: > agreed > > for i := Search(n, f); i < n && data[i] == x; i++ { > } > > vs. > > i := Search(n, f) > if i < n { > for ; i >= 0 && data[i] == x; i-- { > } > } > > it also means that the code for sorted insertion is considerably more > straightforward (see my post in golang-nuts) > > again, sorry for the bikeshed, but i definitely think this is a change > worth making. > I think it's not a clear cut if there's indeed an extra call to f. f might be quite expensive, and Search might be used with an expensive f for smallish data sets. Adding one call to f in this case would make it significantly slower. - gri
Sign in to reply to this message.
On 11 November 2010 19:33, Robert Griesemer <gri@golang.org> wrote: > This sounds good to me; I actually was shooting for such a result at first, > but wasn't successful in making it work with what looks like the same amount > of calls to f (at least on paper). yes, i'm not sure it is possible. the current algorithm is imprecise by one bit when the element is not found - i can't prove there's no better one though. >> the search example remains the same. > > Not quite. Since the result can be n one would have to check both against > len(data) == 0 (empty data) and i != n before testing data[i] == elem. > Admittedly a negligible cost. oops, you're right, the example is slightly different. but checking i < n before testing data[i] == elem is sufficient, which is the same cost, as the example, although admittedly there's a slight cost if the len(data) check had been done previously. > I think it's not a clear cut if there's indeed an extra call to f. f might > be quite expensive, and Search might be used with an expensive f for > smallish data sets. Adding one call to f in this case would make it > significantly slower. yes. although personally i'd prefer the simpler semantics as they make it easier to understand code that's using Search. when performance is crucial it would usually be straightforward to replace Search by another function with the original semantics, as the existence test: if i < n && data[i] == x {...} is sufficient for either algorithm.
Sign in to reply to this message.
|