|
|
Descriptionencoding/json: speed up decoding
Don't make copies of keys while decoding, and don't use the
expensive strings.EqualFold when it's not necessary. Instead,
note in the existing field cache what algorithm to use to
check fold equality... most keys are just ASCII letters.
benchmark old ns/op new ns/op delta
BenchmarkCodeDecoder 137074314 103974418 -24.15%
benchmark old MB/s new MB/s speedup
BenchmarkCodeDecoder 14.16 18.66 1.32x
Update Issue 6496
Patch Set 1 #Patch Set 2 : diff -r 351b6fe0ae36 https://go.googlecode.com/hg/ #Patch Set 3 : diff -r 351b6fe0ae36 https://go.googlecode.com/hg/ #Patch Set 4 : diff -r 351b6fe0ae36 https://go.googlecode.com/hg/ #Patch Set 5 : diff -r 351b6fe0ae36 https://go.googlecode.com/hg/ #
Total comments: 16
Patch Set 6 : diff -r 878fc73cc5e4 https://go.googlecode.com/hg/ #
Total comments: 4
Patch Set 7 : diff -r 878fc73cc5e4 https://go.googlecode.com/hg/ #
Total comments: 4
Patch Set 8 : diff -r 878fc73cc5e4 https://go.googlecode.com/hg/ #
Total comments: 5
Patch Set 9 : diff -r 38b64a8de1a5 https://go.googlecode.com/hg/ #Patch Set 10 : diff -r 38b64a8de1a5 https://go.googlecode.com/hg/ #
MessagesTotal messages: 28
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
I haven't fully grokked this yet, but it sounds like you are assuming that if the field is only ascii then you can use ascii-fold-equivalence for the input. But that's not true: the JSON key might be non-ascii Unicode that is fold-equivalent to the ascii. This can only happen for a few letters, but it definitely can happen.
Sign in to reply to this message.
On Wed, Sep 25, 2013 at 10:12 AM, Russ Cox <rsc@golang.org> wrote: > I haven't fully grokked this yet, but it sounds like you are assuming that > if the field is only ascii then you can use ascii-fold-equivalence for the > input. But that's not true: the JSON key might be non-ascii Unicode that is > fold-equivalent to the ascii. This can only happen for a few letters, but > it definitely can happen. > For K and S apparently: http://play.golang.org/p/tTxjOc0OGo I'll adjust the algorithm selection.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Please "hg upload" again, getting "old chunk mismatch"
Sign in to reply to this message.
Please hold off until after Go 1.2
Sign in to reply to this message.
PTAL now that Go 1.2 is out
Sign in to reply to this message.
LGTM but I'm not a unicode expert Remove "Probably too late for Go 1.2, but don't want to forget it." from the CL description.
Sign in to reply to this message.
https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold.go File src/pkg/encoding/json/fold.go (right): https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:19: // 4) simpleLetterEqualFold, if s is all simple (not 'k' or 's') ascii letters. s/ascii/ASCII/ https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:31: // See http://play.golang.org/p/tTxjOc0OGo not super helpful. keep the link, but before that say // S maps to s and to U+017F 'ſ' Latin small letter long s // k maps to K and to U+212A 'K' Kelvin sign perhaps that information should be up top https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:46: // requiring SimpleFold on the bytes in t. // See comments on foldFunc. https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:54: case 's', 'S', 'k', 'K': just switch on sb and stuff the appropriate mapping. then put a check in the tests that you get the same answer as unicode.SimpleFold. it's a lot of code and will get run often. https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:72: if sb != tb && sb & ^byte(32) != tb & ^byte(32) { const caseMask ^byte(0x20) // Mask to ignore case in ASCII. https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:82: // when s doesn't contain 'k', 'K', 's', or 'S' but is otherwise all ASCII. // simpleLetterEqualFold is a specialization of bytes.EqualFold for use when s is all ASCII and doesn't... https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:88: if b & ^byte(32) != t[i] & ^byte(32) { use the const. https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:105: if sb & ^byte(32) != tb & ^byte(32) { use the const
Sign in to reply to this message.
https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold.go File src/pkg/encoding/json/fold.go (right): https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:19: // 4) simpleLetterEqualFold, if s is all simple (not 'k' or 's') ascii letters. On 2013/12/18 04:35:30, r wrote: > s/ascii/ASCII/ Done. https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:31: // See http://play.golang.org/p/tTxjOc0OGo On 2013/12/18 04:35:30, r wrote: > not super helpful. > keep the link, but before that say > // S maps to s and to U+017F 'ſ' Latin small letter long s > // k maps to K and to U+212A 'K' Kelvin sign > > perhaps that information should be up top Done. https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:46: // requiring SimpleFold on the bytes in t. On 2013/12/18 04:35:30, r wrote: > // See comments on foldFunc. Done. https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:54: case 's', 'S', 'k', 'K': On 2013/12/18 04:35:30, r wrote: > just switch on sb I am? > and stuff the appropriate mapping. > then put a check in the tests that you get the same answer as > unicode.SimpleFold. > it's a lot of code and will get run often. I'm not sure I understand. I could also do this in a subsequent CL if you're proposing copying more of utf8/unicode into this package. I'd rather not for not, or at least not lump that together with this. I'd like that to stand on its own merits first. https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:72: if sb != tb && sb & ^byte(32) != tb & ^byte(32) { On 2013/12/18 04:35:30, r wrote: > const caseMask ^byte(0x20) // Mask to ignore case in ASCII. Done. https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:82: // when s doesn't contain 'k', 'K', 's', or 'S' but is otherwise all ASCII. On 2013/12/18 04:35:30, r wrote: > // simpleLetterEqualFold is a specialization of bytes.EqualFold for use when s > is all ASCII and doesn't... Done. https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:88: if b & ^byte(32) != t[i] & ^byte(32) { On 2013/12/18 04:35:30, r wrote: > use the const. Done. https://codereview.appspot.com/13894045/diff/25001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:105: if sb & ^byte(32) != tb & ^byte(32) { On 2013/12/18 04:35:30, r wrote: > use the const Done.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, rsc@golang.org, adg@golang.org, r@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
PTAL Updated numbers in CL description, best of three runs each (but not very variable) with -benchtime=3s on my mac i7. benchmark old ns/op new ns/op delta BenchmarkCodeDecoder 139324606 110571239 -20.64% benchmark old MB/s new MB/s speedup BenchmarkCodeDecoder 13.93 17.55 1.26x On Tue, Dec 17, 2013 at 9:24 PM, <bradfitz@golang.org> wrote: > Hello golang-dev@googlegroups.com, rsc@golang.org, adg@golang.org, > r@golang.org (cc: golang-dev@googlegroups.com), > > Please take another look. > > > https://codereview.appspot.com/13894045/ >
Sign in to reply to this message.
lgtm https://codereview.appspot.com/13894045/diff/27001/src/pkg/encoding/json/fold.go File src/pkg/encoding/json/fold.go (right): https://codereview.appspot.com/13894045/diff/27001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:1: // Copyright 2013 The Go Authors. All rights reserved. single space? https://codereview.appspot.com/13894045/diff/27001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:37: upper := b & ^byte(32) caseMask
Sign in to reply to this message.
PTAL https://codereview.appspot.com/13894045/diff/27001/src/pkg/encoding/json/fold.go File src/pkg/encoding/json/fold.go (right): https://codereview.appspot.com/13894045/diff/27001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:1: // Copyright 2013 The Go Authors. All rights reserved. On 2013/12/18 06:04:08, mikio wrote: > single space? Sure. I just copied another file. https://codereview.appspot.com/13894045/diff/27001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:37: upper := b & ^byte(32) On 2013/12/18 06:04:08, mikio wrote: > caseMask It's not being used the same way as documented and used elsewhere in this file, but sure. Done.
Sign in to reply to this message.
i might be missing something but consider my approach https://codereview.appspot.com/13894045/diff/47001/src/pkg/encoding/json/fold.go File src/pkg/encoding/json/fold.go (right): https://codereview.appspot.com/13894045/diff/47001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:64: switch sb { what i mean is that if you know sb is 'k', say, you know the small list of characters that t must have. just look for them: switch sb { case 'k': tb must be k, K, or '\u221'. etc. } you've already exposed pretty deep details of the unicode package just by singling out k and s; it makes little sense to me to stop there when the code could be so much simpler if you just wrote the trivial code for the four characters. i think you'll find it's really easy. another thing you can do is this. s is known to be ascii. if t is also ascii, it's trivial: tb := t[0] if tb < utf8.RuneSelf { // Both are ASCII. Easy test. if sb&caseMask != tb&caseMask { return false } } // sb is ASCII and t is not. t must be either kelvin sign or long s; sb must be s, S, k, or K tr, size := ... if size etc. return false switch sb{ case 's', 'S': return tr == long s case 'k', 'K': return tr == kelvin } return false: https://codereview.appspot.com/13894045/diff/47001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:83: if sb != tb && sb&caseMask != tb&caseMask { why do you need the first condition given the second?
Sign in to reply to this message.
obviously there's an "else continue" missing in the ascii case. On Tue, Dec 17, 2013 at 11:04 PM, <r@golang.org> wrote: > i might be missing something but consider my approach > > > https://codereview.appspot.com/13894045/diff/47001/src/pkg/encoding/json/fold.go > File src/pkg/encoding/json/fold.go (right): > > https://codereview.appspot.com/13894045/diff/47001/src/pkg/encoding/json/fold... > src/pkg/encoding/json/fold.go:64: switch sb { > what i mean is that if you know sb is 'k', say, you know the small list > of characters that t must have. just look for them: > switch sb { > case 'k': > tb must be k, K, or '\u221'. > etc. > } > > you've already exposed pretty deep details of the unicode package just > by singling out k and s; it makes little sense to me to stop there when > the code could be so much simpler if you just wrote the trivial code for > the four characters. i think you'll find it's really easy. > > another thing you can do is this. s is known to be ascii. if t is also > ascii, it's trivial: > > tb := t[0] > if tb < utf8.RuneSelf { > // Both are ASCII. Easy test. > if sb&caseMask != tb&caseMask { > return false > } > } > // sb is ASCII and t is not. t must be either kelvin sign or long s; sb > must be s, S, k, or K > tr, size := ... > if size etc. return false > switch sb{ > case 's', 'S': > return tr == long s > case 'k', 'K': > return tr == kelvin > } > return false: > > https://codereview.appspot.com/13894045/diff/47001/src/pkg/encoding/json/fold... > src/pkg/encoding/json/fold.go:83: if sb != tb && sb&caseMask != > tb&caseMask { > why do you need the first condition given the second? > > https://codereview.appspot.com/13894045/
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, rsc@golang.org, adg@golang.org, r@golang.org, mikioh.mikioh@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/13894045/diff/47001/src/pkg/encoding/json/fold.go File src/pkg/encoding/json/fold.go (right): https://codereview.appspot.com/13894045/diff/47001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:64: switch sb { On 2013/12/18 07:04:23, r wrote: > what i mean is that if you know sb is 'k', say, you know the small list of > characters that t must have. just look for them: > switch sb { > case 'k': > tb must be k, K, or '\u221'. > etc. > } > > you've already exposed pretty deep details of the unicode package just by > singling out k and s; it makes little sense to me to stop there when the code > could be so much simpler if you just wrote the trivial code for the four > characters. i think you'll find it's really easy. > > another thing you can do is this. s is known to be ascii. if t is also ascii, > it's trivial: > > tb := t[0] > if tb < utf8.RuneSelf { > // Both are ASCII. Easy test. > if sb&caseMask != tb&caseMask { > return false > } > } > // sb is ASCII and t is not. t must be either kelvin sign or long s; sb must be > s, S, k, or K > tr, size := ... > if size etc. return false > switch sb{ > case 's', 'S': > return tr == long s > case 'k', 'K': > return tr == kelvin > } > return false: > > > Done. https://codereview.appspot.com/13894045/diff/47001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:83: if sb != tb && sb&caseMask != tb&caseMask { On 2013/12/18 07:04:23, r wrote: > why do you need the first condition given the second? Code is gone now.
Sign in to reply to this message.
https://codereview.appspot.com/13894045/diff/67001/src/pkg/encoding/json/fold.go File src/pkg/encoding/json/fold.go (right): https://codereview.appspot.com/13894045/diff/67001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:90: t = t[size:] this line is unreachable https://codereview.appspot.com/13894045/diff/67001/src/pkg/encoding/json/fold... File src/pkg/encoding/json/fold_test.go (right): https://codereview.appspot.com/13894045/diff/67001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold_test.go:37: {asciiEqualFold, "aa@", "aa`", false}, // verify 0x40 and 0x60 aren't case-equivalent you now need cross-checking for all the nasty s and k cases against SimpleFold
Sign in to reply to this message.
https://codereview.appspot.com/13894045/diff/67001/src/pkg/encoding/json/fold.go File src/pkg/encoding/json/fold.go (right): https://codereview.appspot.com/13894045/diff/67001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:90: t = t[size:] On 2013/12/18 14:47:54, r wrote: > this line is unreachable The S and K cases fall through here, and this advances t by 2 bytes. (the size variable is kinda redundant)
Sign in to reply to this message.
https://codereview.appspot.com/13894045/diff/67001/src/pkg/encoding/json/fold.go File src/pkg/encoding/json/fold.go (right): https://codereview.appspot.com/13894045/diff/67001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold.go:90: t = t[size:] right, got it.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, rsc@golang.org, adg@golang.org, r@golang.org, mikioh.mikioh@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/13894045/diff/67001/src/pkg/encoding/json/fold... File src/pkg/encoding/json/fold_test.go (right): https://codereview.appspot.com/13894045/diff/67001/src/pkg/encoding/json/fold... src/pkg/encoding/json/fold_test.go:37: {asciiEqualFold, "aa@", "aa`", false}, // verify 0x40 and 0x60 aren't case-equivalent On 2013/12/18 14:47:54, r wrote: > you now need cross-checking for all the nasty s and k cases against SimpleFold Done. Now comprehensive tests for all funcs.
Sign in to reply to this message.
On Wed, Dec 18, 2013 at 7:21 PM, <bradfitz@golang.org> wrote: > > https://codereview.appspot.com/13894045/diff/67001/src/ > pkg/encoding/json/fold_test.go > File src/pkg/encoding/json/fold_test.go (right): > > https://codereview.appspot.com/13894045/diff/67001/src/ > pkg/encoding/json/fold_test.go#newcode37 > src/pkg/encoding/json/fold_test.go:37: {asciiEqualFold, "aa@", "aa`", > false}, // verify 0x40 and 0x60 aren't case-equivalent > On 2013/12/18 14:47:54, r wrote: > >> you now need cross-checking for all the nasty s and k cases against >> > SimpleFold > > Done. Now comprehensive tests for all funcs. > > ... which found a bug, now fixed: fold_test.go:78: equalFoldRight("x@x", "x`x") = true; want false fold_test.go:78: equalFoldRight("x[x", "x{x") = true; want false fold_test.go:78: equalFoldRight("x\\x", "x|x") = true; want false fold_test.go:78: equalFoldRight("x]x", "x}x") = true; want false fold_test.go:78: equalFoldRight("x^x", "x~x") = true; want false fold_test.go:78: equalFoldRight("x_x", "x\u007fx") = true; want false fold_test.go:78: equalFoldRight("x`x", "x@x") = true; want false fold_test.go:78: equalFoldRight("x{x", "x[x") = true; want false fold_test.go:78: equalFoldRight("x|x", "x\\x") = true; want false fold_test.go:78: equalFoldRight("x}x", "x]x") = true; want false fold_test.go:78: equalFoldRight("x~x", "x^x") = true; want false fold_test.go:78: equalFoldRight("x\u007fx", "x_x") = true; want false
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=28a4d13557a9 *** encoding/json: speed up decoding Don't make copies of keys while decoding, and don't use the expensive strings.EqualFold when it's not necessary. Instead, note in the existing field cache what algorithm to use to check fold equality... most keys are just ASCII letters. benchmark old ns/op new ns/op delta BenchmarkCodeDecoder 137074314 103974418 -24.15% benchmark old MB/s new MB/s speedup BenchmarkCodeDecoder 14.16 18.66 1.32x Update Issue 6496 R=golang-dev, rsc, adg, r, mikioh.mikioh CC=golang-dev https://codereview.appspot.com/13894045
Sign in to reply to this message.
Message was sent while issue was closed.
FYI, effects on json benchmark: http://goperfd.appspot.com/perfdetail?commit=28a4d13557a99de31746a59fd0d07ddc...
Sign in to reply to this message.
Fun. On Fri, Dec 20, 2013 at 1:00 AM, <dvyukov@google.com> wrote: > FYI, effects on json benchmark: > > http://goperfd.appspot.com/perfdetail?commit= > 28a4d13557a99de31746a59fd0d07ddcfeed351b&builder=windows- > amd64&benchmark=json > > > > https://codereview.appspot.com/13894045/ >
Sign in to reply to this message.
|