|
|
Descriptionencoding/json: add "overflow" struct tag option
Fixes issue 6213.
Patch Set 1 #
Total comments: 6
Patch Set 2 : diff -r c98067b47ab0 https://code.google.com/p/go #
MessagesTotal messages: 16
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://code.google.com/p/go
Sign in to reply to this message.
https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/encode.go File src/pkg/encoding/json/encode.go (right): https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/encode.g... src/pkg/encoding/json/encode.go:255: panic("endOverflow called before startOverflow") what if e.Len() is zero at startOverflow? is that possible? i think it is if the first field is unrecognized. https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/example_... File src/pkg/encoding/json/example_test.go (right): https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/example_... src/pkg/encoding/json/example_test.go:83: "Animal": "rabbit" did you know bimmler's address was rabbit!bimmler?
Sign in to reply to this message.
https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/decode.go File src/pkg/encoding/json/decode.go (right): https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/decode.g... src/pkg/encoding/json/decode.go:64: // When unmarshalling into struct values, a struct field of map type with the unmarshaling
Sign in to reply to this message.
https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/decode.go File src/pkg/encoding/json/decode.go (right): https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/decode.g... src/pkg/encoding/json/decode.go:64: // When unmarshalling into struct values, a struct field of map type with the On 2013/08/29 05:17:16, r wrote: > unmarshaling Done. https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/encode.go File src/pkg/encoding/json/encode.go (right): https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/encode.g... src/pkg/encoding/json/encode.go:255: panic("endOverflow called before startOverflow") On 2013/08/29 05:16:51, r wrote: > what if e.Len() is zero at startOverflow? is that possible? i think it is if the > first field is unrecognized. nope, there's always an opening curly brace https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/example_... File src/pkg/encoding/json/example_test.go (right): https://codereview.appspot.com/13324045/diff/1/src/pkg/encoding/json/example_... src/pkg/encoding/json/example_test.go:83: "Animal": "rabbit" On 2013/08/29 05:16:51, r wrote: > did you know bimmler's address was rabbit!bimmler? yes :-)
Sign in to reply to this message.
On 2013/08/29 04:46:55, adg wrote: > Hello mailto:golang-dev@googlegroups.com, > > I'd like you to review this change to > https://code.google.com/p/go NOT LGTM benchmark old ns/op new ns/op delta BenchmarkCodeEncoder 43353324 46277117 +6.74% BenchmarkCodeMarshal 46923347 50917386 +8.51% BenchmarkCodeDecoder 188968676 305382293 +61.60% BenchmarkCodeUnmarshal 184834994 302217744 +63.51% BenchmarkCodeUnmarshalReuse 175378432 292963951 +67.05% BenchmarkUnmarshalString 1186 1113 -6.16% BenchmarkUnmarshalFloat64 1018 994 -2.36% BenchmarkUnmarshalInt64 914 954 +4.38% BenchmarkSkipValue 3414183 2936396 -13.99% BenchmarkEncoderEncode 1614 1611 -0.19% benchmark old MB/s new MB/s speedup BenchmarkCodeEncoder 44.76 41.93 0.94x BenchmarkCodeMarshal 41.35 38.11 0.92x BenchmarkCodeDecoder 10.27 6.35 0.62x BenchmarkCodeUnmarshal 10.50 6.42 0.61x BenchmarkSkipValue 81.95 95.28 1.16x benchmark old allocs new allocs delta BenchmarkCodeEncoder 1 1 0.00% BenchmarkCodeMarshal 17 17 0.00% BenchmarkCodeDecoder 180415 182079 0.92% BenchmarkCodeUnmarshal 195430 195439 0.00% BenchmarkCodeUnmarshalReuse 180382 182063 0.93% BenchmarkUnmarshalString 2 2 0.00% BenchmarkUnmarshalFloat64 2 2 0.00% BenchmarkUnmarshalInt64 2 2 0.00% BenchmarkSkipValue 32 32 0.00% BenchmarkEncoderEncode 2 2 0.00% benchmark old bytes new bytes delta BenchmarkCodeEncoder 12 12 0.00% BenchmarkCodeMarshal 4578344 4578360 0.00% BenchmarkCodeDecoder 3952156 5100454 29.05% BenchmarkCodeUnmarshal 4283600 4285227 0.04% BenchmarkCodeUnmarshalReuse 2944364 3094734 5.11% BenchmarkUnmarshalString 276 276 0.00% BenchmarkUnmarshalFloat64 268 268 0.00% BenchmarkUnmarshalInt64 268 268 0.00% BenchmarkSkipValue 2291 2291 0.00% BenchmarkEncoderEncode 221 237 7.24% Performance is severely affected by this CL, reducing encoding speed by 7-8% and almost halving decoding speed (which is already dogslow). Also, the memory consumption for decoding is increased by 30%. These are unacceptable tradeoffs, specially when the use cases this patch addresses are very limited. If you're interested in JSON keys that might or might not be there, you're better off adding all the fields to your struct. In the rare case that the JSON you're parsing has arbitrary keys that you can't know beforehand, you can always decode into a map[string]interface{} and inspect the keys later. On the other hand, if the performance issues were to be fixed, I think the name "overflow" is kinda bad since most people would think it's related to numeric overflows (e.g. decoding 257 into a uint8). The "inline" tag used by bson is way better, since it means that field's subfields should be treated as if they were directly included in the container struct. This makes sense both when encoding (the map keys are interpreted as if they were fields in the container struct) and when decoding (if the container struct was a map, they key would be added to the map). Regards, Alberto
Sign in to reply to this message.
Thanks for running the benchmarks. I agree that this patch is a non-starter with this kind of performance degradation. I suspect the issues may be caused by the refactoring I did, which added more function calls to the critical path. I'll take a look at optimizing this. In theory it should have no sizable effect on performance. On 30 August 2013 09:58, <alberto.garcia.hierro@gmail.com> wrote: > On 2013/08/29 04:46:55, adg wrote: > >> Hello mailto:golang-dev@**googlegroups.com <golang-dev@googlegroups.com>, >> > > I'd like you to review this change to >> https://code.google.com/p/go >> > > NOT LGTM > > benchmark old ns/op new ns/op delta > BenchmarkCodeEncoder 43353324 46277117 +6.74% > BenchmarkCodeMarshal 46923347 50917386 +8.51% > BenchmarkCodeDecoder 188968676 305382293 +61.60% > BenchmarkCodeUnmarshal 184834994 302217744 +63.51% > BenchmarkCodeUnmarshalReuse 175378432 292963951 +67.05% > BenchmarkUnmarshalString 1186 1113 -6.16% > BenchmarkUnmarshalFloat64 1018 994 -2.36% > BenchmarkUnmarshalInt64 914 954 +4.38% > BenchmarkSkipValue 3414183 2936396 -13.99% > BenchmarkEncoderEncode 1614 1611 -0.19% > > benchmark old MB/s new MB/s speedup > BenchmarkCodeEncoder 44.76 41.93 0.94x > BenchmarkCodeMarshal 41.35 38.11 0.92x > BenchmarkCodeDecoder 10.27 6.35 0.62x > BenchmarkCodeUnmarshal 10.50 6.42 0.61x > BenchmarkSkipValue 81.95 95.28 1.16x > > benchmark old allocs new allocs delta > BenchmarkCodeEncoder 1 1 0.00% > BenchmarkCodeMarshal 17 17 0.00% > BenchmarkCodeDecoder 180415 182079 0.92% > BenchmarkCodeUnmarshal 195430 195439 0.00% > BenchmarkCodeUnmarshalReuse 180382 182063 0.93% > BenchmarkUnmarshalString 2 2 0.00% > BenchmarkUnmarshalFloat64 2 2 0.00% > BenchmarkUnmarshalInt64 2 2 0.00% > BenchmarkSkipValue 32 32 0.00% > BenchmarkEncoderEncode 2 2 0.00% > > benchmark old bytes new bytes delta > BenchmarkCodeEncoder 12 12 0.00% > BenchmarkCodeMarshal 4578344 4578360 0.00% > BenchmarkCodeDecoder 3952156 5100454 29.05% > BenchmarkCodeUnmarshal 4283600 4285227 0.04% > BenchmarkCodeUnmarshalReuse 2944364 3094734 5.11% > BenchmarkUnmarshalString 276 276 0.00% > BenchmarkUnmarshalFloat64 268 268 0.00% > BenchmarkUnmarshalInt64 268 268 0.00% > BenchmarkSkipValue 2291 2291 0.00% > BenchmarkEncoderEncode 221 237 7.24% > > Performance is severely affected by this CL, reducing encoding speed by > 7-8% and almost halving decoding speed (which is already dogslow). Also, > the memory consumption for decoding is increased by 30%. These are > unacceptable tradeoffs, specially when the use cases this patch > addresses are very limited. If you're interested in JSON keys that might > or might not be there, you're better off adding all the fields to your > struct. In the rare case that the JSON you're parsing has arbitrary keys > that you can't know beforehand, you can always decode into a > map[string]interface{} and inspect the keys later. > > On the other hand, if the performance issues were to be fixed, I think > the name "overflow" is kinda bad since most people would think it's > related to numeric overflows (e.g. decoding 257 into a uint8). The > "inline" tag used by bson is way better, since it means that field's > subfields should be treated as if they were directly included in the > container struct. This makes sense both when encoding (the map keys are > interpreted as if they were fields in the container struct) and when > decoding (if the container struct was a map, they key would be added to > the map). > > Regards, > Alberto > > https://codereview.appspot.**com/13324045/<https://codereview.appspot.com/133... >
Sign in to reply to this message.
On 2013/08/30 00:02:10, adg wrote: > Thanks for running the benchmarks. I agree that this patch is a non-starter > with this kind of performance degradation. > > I suspect the issues may be caused by the refactoring I did, which added > more function calls to the critical path. I checked your code and that's very likely to be the cause. We have an internal high performance fork of encoding/json and one of the things we did was removing as many branches and function calls as possible from the main loop in (*decodeState).object (we ended up with two loops, one for maps and one for structs). I don't remember the exact numbers, but it gave us a nice speed-up. On the other hand, wouldn't it be better to implement this more generally, so it would work for non-maps too? I think you might be able to implement it by just modifying typeFields() and adding a special case when decoding structs if there's a map tagged with the "overflow" option. Regards, Alberto
Sign in to reply to this message.
On 30 August 2013 10:31, <alberto.garcia.hierro@gmail.com> wrote: > On the other hand, wouldn't it be better to implement this more > generally, so it would work for non-maps too? I think you might be able > to implement it by just modifying typeFields() and adding a special case > when decoding structs if there's a map tagged with the "overflow" > option. > I had done that at first, but the result was more complicated than current change, and I can't think of a possible use for an overflow struct when embedding works just as well.
Sign in to reply to this message.
On 30 August 2013 10:31, <alberto.garcia.hierro@gmail.com> wrote: > I checked your code and that's very likely to be the cause. We have an > internal high performance fork of encoding/json and one of the things we > did was removing as many branches and function calls as possible from > the main loop in (*decodeState).object (we ended up with two loops, one > for maps and one for structs). I don't remember the exact numbers, but > it gave us a nice speed-up. > I'd be interested to see those changes. Want to send a CL? (For discussion, if not submission.:)
Sign in to reply to this message.
On 2013/08/30 00:34:21, adg wrote: > I'd be interested to see those changes. Want to send a CL? (For discussion, > if not submission.:) Not right now, sorry. Our decoder currently uses a bunch of tricks depending on the unsafe package and lacks "safe" fallbacks for environments which can't use package unsafe (although we plan to add them before releasing this code). It also needs a bit more testing and cleaning up before it can be published. We will probably release it in the coming months along our web framework. Furthermore, when we tried to submit some of our changes for inclusion, Rob and Russ made it very clear that they found our approach to optimizing encoding/json inappropriate for the standard library, where maintainability is preferred over performance. Also, our code has diverged so much from encoding/json that it's basically a reimplementation of its public interface, so I'm not sure if it could be of any help. I can, however, tell you what we did on the high level, just in case you might find it helpful. From the top of my head, it was: - Don't check the JSON for validity beforehand, but rather check it as it's being parsed, without altering the public function semantics. This sped up parsing valid JSON considerably, while slowing down parsing invalid JSON. I'd say 99.999% of the JSON our apps parse is valid, so optimizing for the common case (valid JSON) made more sense to us (people who reviewed the CL I sent with this change didn't agree with that). - Rewrite the scanner to be event based, rather than byte based. The current scanner casts every byte it's fed to int, causing a lot of unneeded MOVZX or equivalent (we were told that this didn't make a difference big enough to warrant a code change, we beg to differ because a lot of small performance improvements eventually become a significant improvement), and requires a function call for every character in the input. Function calls are very slow in Go (compared to C), so we reduced them by making the scanner return a constant indicating the type of the next event (beginString, beginNumber, beginObject, endObject, comma, etc…) and then having specific functions for each event (e.g. scanNull, scanTrue, scanStringBytes, scanComma…) - Change as many functions as possible to receive a *reflect.Value, rather than a reflect.Value. Passing a pointer translates to a LEAQ and a MOVQ, while passing a reflect.Value generates 6 MOVQ (3 from the stack and another 3 to the stack). Functions in decode.go call each other a lot of times while decoding JSON and those extra trips to memory quickly add up. - Remove as much function calls as possible from the hot code paths. This meant manually duplicating a bit of code, but it's OK for us. - Remove as many branches as possible from hot code paths (e.g. the current code in object() checks if the value is a map or a struct on every iteration rather than before beginning the loop, and array() does kind of the same with reflect.Array and reflect.Slice). - Use ASM to look for the end of the strings, while also checking if they have to be escaped (and if they have to, from where in the string you need to start escaping) iterating over them just once. unquote() is very expensive so the more times you can avoid it, the better (and, truth to be told, most strings in JSON don't need to be unquoted - most people use only ASCII for object keys, which represent a high percentage of the strings present in a JSON payload). - Use "unsafe" conversions from []byte to string when appropriate. This avoids a lot of copying and reduces garbage significantly and the conversions are not really unsafe, since the scanner is holding a reference to the underlying []byte, which prevents any invalid memory access from happening. The worst thing that might happen is the JSON won't be correctly decoded if the underlying []byte is modified while hashing or comparing the "unsafe" string, but if you're changing the JSON data at the same time you're parsing it, all bets are off (we were also told that this wasn't OK for the standard library). - Cache JSON key to struct field lookups in decodeState, to allow lockless access to this cache. strings.EqualFold is slower than the bad guy's horse. These changes gave us around 5x-6x more performance while decoding, but keep in mind that there are still a few tests that don't pass, so there might be some small performance decreases when we fix those corner cases. Regards, Alberto
Sign in to reply to this message.
NOT LGTM Too late, and unclear that it's worthwhile anyway. If you need custom stuff, there is always json.RawMessage.
Sign in to reply to this message.
On 4 September 2013 07:13, <rsc@golang.org> wrote: > NOT LGTM > > Too late, and unclear that it's worthwhile anyway. If you need custom > stuff, there is always json.RawMessage. > Many people have re-implemented this same functionality over and over again. Sure, it's a convenience that can be worked around, but it's also something that people want. If it's too late for 1.2 that's fine, but I don't think this feature should be dismissed out of hand.
Sign in to reply to this message.
On Tue, Sep 3, 2013 at 8:20 PM, Andrew Gerrand <adg@golang.org> wrote: > > On 4 September 2013 07:13, <rsc@golang.org> wrote: > >> NOT LGTM >> >> Too late, and unclear that it's worthwhile anyway. If you need custom >> stuff, there is always json.RawMessage. >> > > Many people have re-implemented this same functionality over and over > again. > > Sure, it's a convenience that can be worked around, but it's also > something that people want. > I don't believe that's a useful mode of argument. Generics are also something that people want. > If it's too late for 1.2 that's fine, but I don't think this feature > should be dismissed out of hand. > Okay, let's postpone this discussion until 1.2 is out. You're basically asking for protocol buffers' unrecognized fields support. I am not convinced. Russ
Sign in to reply to this message.
R=rsc@golang.org (assigned by rsc@google.com)
Sign in to reply to this message.
Replacing golang-dev with golang-codereviews.
Sign in to reply to this message.
|