|
|
Created:
13 years, 5 months ago by gri Modified:
13 years, 5 months ago Reviewers:
CC:
rsc, r2, nigeltao, r, dsymonds, golang-dev Visibility:
Public. |
Descriptionencoding/binary: support for varint encoding
Patch Set 1 #Patch Set 2 : diff -r c8a7809fabcd https://go.googlecode.com/hg/ #Patch Set 3 : diff -r c8a7809fabcd https://go.googlecode.com/hg/ #Patch Set 4 : diff -r c8a7809fabcd https://go.googlecode.com/hg/ #Patch Set 5 : diff -r c8a7809fabcd https://go.googlecode.com/hg/ #Patch Set 6 : diff -r c8a7809fabcd https://go.googlecode.com/hg/ #
Total comments: 4
Patch Set 7 : diff -r c8a7809fabcd https://go.googlecode.com/hg/ #Patch Set 8 : diff -r c8a7809fabcd https://go.googlecode.com/hg/ #Patch Set 9 : diff -r c8a7809fabcd https://go.googlecode.com/hg/ #
Total comments: 34
Patch Set 10 : diff -r 9a44dedca5dd https://go.googlecode.com/hg/ #
Total comments: 12
Patch Set 11 : diff -r 9a44dedca5dd https://go.googlecode.com/hg/ #Patch Set 12 : diff -r 9a44dedca5dd https://go.googlecode.com/hg/ #
Total comments: 12
Patch Set 13 : diff -r 7ae32f195afd https://go.googlecode.com/hg/ #
MessagesTotal messages: 18
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
http://codereview.appspot.com/5146048/diff/10001/src/pkg/encoding/binary/vari... File src/pkg/encoding/binary/varint.go (right): http://codereview.appspot.com/5146048/diff/10001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:19: // PutUint encodes a uint64 into buf and returns the number of bytes written. I think PutUint is too generic a name. It needs to say varint somehow. Possibly: PutVarint PutUvarint Varint Uvarint ReadVarint ReadUvarint WriteVarint WriteUvarint http://codereview.appspot.com/5146048/diff/10001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:23: if x < 128 { 128 is fine but you are really using this as a bit so 0x80 and b&^0x80 seems clearer.
Sign in to reply to this message.
PTAL. I also changed error handling a bit: - to be compatible with longer encodings (e.g. 128 bit), decoding continues until the continuation bit is not set (or EOF) - overflow is detected accurately http://codereview.appspot.com/5146048/diff/10001/src/pkg/encoding/binary/vari... File src/pkg/encoding/binary/varint.go (right): http://codereview.appspot.com/5146048/diff/10001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:19: // PutUint encodes a uint64 into buf and returns the number of bytes written. On 2011/09/28 01:28:21, rsc wrote: > I think PutUint is too generic a name. It needs to say > varint somehow. > > Possibly: > > PutVarint > PutUvarint > > Varint > Uvarint > > ReadVarint > ReadUvarint > WriteVarint > WriteUvarint Done. http://codereview.appspot.com/5146048/diff/10001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:23: if x < 128 { On 2011/09/28 01:28:21, rsc wrote: > 128 is fine but you are really using this as a > bit so 0x80 and b&^0x80 seems clearer. Done.
Sign in to reply to this message.
the U is in the wrong place but it looks truly horrible in the right place, so wrong it will be. i'll look at this tomorrow. -rob
Sign in to reply to this message.
I'd also add // MaxVarintLen is the maximum length of a varint-encoded int64 or uint64. const MaxVarintLen = 10 http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... File src/pkg/encoding/binary/varint.go (right): http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:5: // This file implements "varint" encoding of 64bit integers. If you put this file-specific comment between the package clause and the imports, then it's less likely to be mistaken for a package doc comment. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:93: var overflow = os.NewError("overflow: not a 64bit integer") I'd make the string "binary: varint overflows a 64bit integer". The "binary" prefix gives a hint for what package to look for when encountering an error.
Sign in to reply to this message.
http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... File src/pkg/encoding/binary/varint.go (right): http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:6: // The encoding is as follows: The encoding is: (it always follows) http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:7: // - unsigned integers are serialized 7bits at a time, starting with the lsb s/7bits/7 bits/ s/lsb/least significant bit (lsb)/ http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:8: // - the msb in each byte indicates if there is a continuation byte (msb = 1) s/msb/most significant bit (msb)/ s/byte/output byte/ http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:10: // encoding (see PutVarint) i see nothing about zig-zag in the comments for PutVarint. just tell me the encoding here, please. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:13: // At most 10 bytes are needed for 64bit values. The encoding could be more s/64bit/64-bit/ http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:14: // dense: A full 64bit value needs an extra byte just to hold bit 63. Instead, ditto http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:19: // format incompatible with a varint encoding for larger numbers (say 128bit). ditt sort of http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:29: // If the buffer is to small, the result is 0. s/to/too/ http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:44: // undefined and the number of bytes is zero. If overflow occured, s/occured/occurred/ http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:52: return x, -i // overflow why not 0, -i and say so in the comment? anyway i'd be happier with an error return in all these failures but won't press for it http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:63: // If the buffer is to small, the result is 0. s/to/too/ http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:75: // the number of bytes returned is negated. same comments as above. (be sure to fix "occured") http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:85: // WriteUvarint writes a uint64 to w. WriteUvarint encodes x and writes the result to w. same below. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:95: // ReadUvarint reads a uint64 from r. ReadUvarint reads an encoded unsigned integer from r and returns it as a uint64. similarly below.
Sign in to reply to this message.
PTAL. Also: added missing tests. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... File src/pkg/encoding/binary/varint.go (right): http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:5: // This file implements "varint" encoding of 64bit integers. On 2011/09/28 10:09:03, nigeltao wrote: > If you put this file-specific comment between the package clause and the > imports, then it's less likely to be mistaken for a package doc comment. Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:5: // This file implements "varint" encoding of 64bit integers. On 2011/09/28 10:09:03, nigeltao wrote: > If you put this file-specific comment between the package clause and the > imports, then it's less likely to be mistaken for a package doc comment. Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:6: // The encoding is as follows: On 2011/09/28 16:10:09, r wrote: > The encoding is: > (it always follows) Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:7: // - unsigned integers are serialized 7bits at a time, starting with the lsb On 2011/09/28 16:10:09, r wrote: > s/7bits/7 bits/ > s/lsb/least significant bit (lsb)/ Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:8: // - the msb in each byte indicates if there is a continuation byte (msb = 1) On 2011/09/28 16:10:09, r wrote: > s/msb/most significant bit (msb)/ > s/byte/output byte/ Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:8: // - the msb in each byte indicates if there is a continuation byte (msb = 1) On 2011/09/28 16:10:09, r wrote: > s/msb/most significant bit (msb)/ > s/byte/output byte/ Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:10: // encoding (see PutVarint) On 2011/09/28 16:10:09, r wrote: > i see nothing about zig-zag in the comments for PutVarint. > just tell me the encoding here, please. Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:13: // At most 10 bytes are needed for 64bit values. The encoding could be more On 2011/09/28 16:10:09, r wrote: > s/64bit/64-bit/ Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:14: // dense: A full 64bit value needs an extra byte just to hold bit 63. Instead, On 2011/09/28 16:10:09, r wrote: > ditto Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:19: // format incompatible with a varint encoding for larger numbers (say 128bit). On 2011/09/28 16:10:09, r wrote: > ditt sort of Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:29: // If the buffer is to small, the result is 0. On 2011/09/28 16:10:09, r wrote: > s/to/too/ Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:44: // undefined and the number of bytes is zero. If overflow occured, On 2011/09/28 16:10:09, r wrote: > s/occured/occurred/ Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:52: return x, -i // overflow On 2011/09/28 16:10:09, r wrote: > why not 0, -i > and say so in the comment? > anyway i'd be happier with an error return in all these failures but won't press > for it - changed value from undefined to 0 - made comment clearer - returning an os.Error will make this interface more complicated because it will require an extra (3rd) result since in the non-error case we need the value and the length. Also, e.g. buffer too small may indicate a client to get more bytes and thus is a signal that doesn't necessarily indicate a program/data error. Returning that signal as an os.Error will make that code more complicated. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:63: // If the buffer is to small, the result is 0. On 2011/09/28 16:10:09, r wrote: > s/to/too/ Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:75: // the number of bytes returned is negated. On 2011/09/28 16:10:09, r wrote: > same comments as above. > (be sure to fix "occured") Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:85: // WriteUvarint writes a uint64 to w. On 2011/09/28 16:10:09, r wrote: > WriteUvarint encodes x and writes the result to w. > same below. Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:93: var overflow = os.NewError("overflow: not a 64bit integer") On 2011/09/28 10:09:03, nigeltao wrote: > I'd make the string "binary: varint overflows a 64bit integer". The "binary" > prefix gives a hint for what package to look for when encountering an error. Done. http://codereview.appspot.com/5146048/diff/2010/src/pkg/encoding/binary/varin... src/pkg/encoding/binary/varint.go:95: // ReadUvarint reads a uint64 from r. On 2011/09/28 16:10:09, r wrote: > ReadUvarint reads an encoded unsigned integer from r and returns it as a uint64. > similarly below. Done.
Sign in to reply to this message.
http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... File src/pkg/encoding/binary/varint.go (right): http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:15: // are written as 2*(^x) + 1; i.e., negative numbers are complemented s/i.e./that is/ http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:16: // and the sign is encoded in bit 0. and whether to complement is encoded in bit 0. http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:20: // be more dense: A full 64-bit value needs an extra byte just to hold bit 63. s/A/a/ (not starting a sentence) http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:33: const MaxVarintLen = 10 this is too generic a name. i'd do const MaxVarintLen64 = 10 const MaxVarintLen32 = 5 (or whatever) and so on for 16 and 8. why not? http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:50: // number of bytes read (> 0). If an error occured, the value is 0 s/occured/occurred/ http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:51: // and the number of bytes n is <= 0 with the following meaning: s/ the following/
Sign in to reply to this message.
PTAL http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... File src/pkg/encoding/binary/varint.go (right): http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:15: // are written as 2*(^x) + 1; i.e., negative numbers are complemented On 2011/09/28 18:33:49, r wrote: > s/i.e./that is/ Done. http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:16: // and the sign is encoded in bit 0. On 2011/09/28 18:33:49, r wrote: > and whether to complement is encoded in bit 0. Done. http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:20: // be more dense: A full 64-bit value needs an extra byte just to hold bit 63. On 2011/09/28 18:33:49, r wrote: > s/A/a/ (not starting a sentence) Done. http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:33: const MaxVarintLen = 10 On 2011/09/28 18:33:49, r wrote: > this is too generic a name. i'd do > const MaxVarintLen64 = 10 > const MaxVarintLen32 = 5 (or whatever) > and so on for 16 and 8. why not? Done. http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:50: // number of bytes read (> 0). If an error occured, the value is 0 On 2011/09/28 18:33:49, r wrote: > s/occured/occurred/ Done. http://codereview.appspot.com/5146048/diff/14007/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:51: // and the number of bytes n is <= 0 with the following meaning: On 2011/09/28 18:33:49, r wrote: > s/ the following/ Done.
Sign in to reply to this message.
LGTM anyone else?
Sign in to reply to this message.
LGTM I only looked at the API.
Sign in to reply to this message.
LGTM http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... File src/pkg/encoding/binary/varint.go (right): http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:7: // This file implements "varint" encoding of 64bit integers. s/64bit/64-bit/ http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:10: // least significant bit (lsb) I would have said s/bit/7 bits/. http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:22: // know there can't be more then 64 bits. This is a trivial improvement and s/then/than/ http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:52: for x >= 0x80 { there are bit tricks you could do to avoid needing this loop, but this works. http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:72: if i > 9 || i == 9 && b > 1 { seems like this should use MaxVarintLen64 http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:130: if i > 9 || i == 9 && b > 1 { use MaxVarintLen64?
Sign in to reply to this message.
LGTM Also, the test only checks round trips, but it would be good to have a few tests for the actual bytes. You can probably steal some good sample data from Nigel's code: http://code.google.com/p/snappy-go/source/browse/varint/varint_test.go http://code.google.com/p/snappy-go/source/browse/varint/zigzag/zigzag_test.go
Sign in to reply to this message.
http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... File src/pkg/encoding/binary/varint.go (right): http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:7: // This file implements "varint" encoding of 64bit integers. On 2011/09/29 00:32:14, dsymonds wrote: > s/64bit/64-bit/ Done. http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:10: // least significant bit (lsb) On 2011/09/29 00:32:14, dsymonds wrote: > I would have said s/bit/7 bits/. changed to: starting with the least significant bits http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:22: // know there can't be more then 64 bits. This is a trivial improvement and On 2011/09/29 00:32:14, dsymonds wrote: > s/then/than/ Done. http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:52: for x >= 0x80 { On 2011/09/29 00:32:14, dsymonds wrote: > there are bit tricks you could do to avoid needing this loop, but this works. Not sure those would be faster. But I tweaked the code a bit and there are now only half the number of iterations needed :-) http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:72: if i > 9 || i == 9 && b > 1 { On 2011/09/29 00:32:14, dsymonds wrote: > seems like this should use MaxVarintLen64 yes, but then in b > 1 the 1 should also be computed in terms of original constants. I played with this: const ( _S = 7 // 7 bits per byte _L = 1<<_S _M = _L-1 ) and used _S instead of 7, _L instead of 0x80, and _M instead of 0x7f. The Max constants can all be computed as well from _S. The tests can be expressed in terms of these constants. In fact, the entire encoding becomes changeable, one could e.g. set _S to 5 and it still works. But the code becomes not more readable. I think either one goes all the way or we leave these constants. Opting for the current approach. There's no point in changing the encoding. http://codereview.appspot.com/5146048/diff/22001/src/pkg/encoding/binary/vari... src/pkg/encoding/binary/varint.go:130: if i > 9 || i == 9 && b > 1 { On 2011/09/29 00:32:14, dsymonds wrote: > use MaxVarintLen64? see above.
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=39de8f482ea8 *** encoding/binary: support for varint encoding R=rsc, r, nigeltao, r, dsymonds CC=golang-dev http://codereview.appspot.com/5146048
Sign in to reply to this message.
I don't remember seeing this detail of the API when I looked. Maybe I just missed it: // PutUvarint encodes a uint64 into buf and returns the number of bytes written. // If the buffer is too small, the result is the negated number of bytes required // (that is, -PutUvarint(nil, x) is the number of bytes required to encode x). The negative size return on error seems overly fussy to me; it belongs in a C program, not a Go program. I would just require that the buffer is large enough, and let the slice writes panic if not. The max is defined. Compare with utf8.EncodeRune. Russ
Sign in to reply to this message.
If the max is not enough, a separate func VarintSize(x int64) int and UvarintSize(x uint64) int would be fine too and would still avoid needing to overload the return value. Russ
Sign in to reply to this message.
There was always an error return (0) if the buffer was too small, in consistency with the read routines. In a second step I added the negative length because the read routines also return the negative length in error cases (overflow). It doesn't cost when not used. I was contemplating just crashing, but the implementation made the error return free. Anyway, simplified, as consistency w/ EncodeRune is compelling. See CL 5163041. - gri On Thu, Sep 29, 2011 at 6:59 AM, Russ Cox <rsc@golang.org> wrote: > If the max is not enough, a separate func VarintSize(x int64) int > and UvarintSize(x uint64) int would be fine too and > would still avoid needing to overload the return value. > > Russ >
Sign in to reply to this message.
|