Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: speed up the decoding scanner #28923

Closed
mvdan opened this issue Nov 22, 2018 · 13 comments
Closed

encoding/json: speed up the decoding scanner #28923

mvdan opened this issue Nov 22, 2018 · 13 comments
Milestone

Comments

@mvdan
Copy link
Member

mvdan commented Nov 22, 2018

#5683 covered the overall speed issues of encoding and decoding JSON. It's now closed, since the package has gotten noticeably faster in the last few Go releases. For example, after a series of patches I sent during 1.12:

name           old time/op    new time/op    delta
CodeEncoder-4    7.43ms ± 0%    5.35ms ± 1%  -28.01%  (p=0.002 n=6+6)
CodeDecoder-4    30.8ms ± 1%    27.3ms ± 0%  -11.42%  (p=0.004 n=6+5)

name           old speed      new speed      delta
CodeEncoder-4   261MB/s ± 0%   363MB/s ± 1%  +38.91%  (p=0.002 n=6+6)
CodeDecoder-4  62.9MB/s ± 1%  70.9MB/s ± 1%  +12.71%  (p=0.002 n=6+6)

Notice, however, how decoding is about five times slower than encoding. While it makes sense that decoding is more work, a 5x difference seems to point that there are some bottlenecks.

Here are the encoding top 10 functions by cpu, as reported by go test -run=- -bench=CodeEncoder -benchtime=5s -cpuprofile=cpu.out:

      flat  flat%   sum%        cum   cum%
     3.93s 10.84% 10.84%     36.07s 99.50%  encoding/json.structEncoder.encode
     3.45s  9.52% 20.36%      3.92s 10.81%  strconv.formatBits
     3.39s  9.35% 29.71%      4.80s 13.24%  strconv.(*extFloat).ShortestDecimal
     2.50s  6.90% 36.61%      2.50s  6.90%  runtime.memmove
     2.41s  6.65% 43.26%      3.18s  8.77%  encoding/json.(*encodeState).string
     2.11s  5.82% 49.08%      2.11s  5.82%  strconv.fmtF
     1.78s  4.91% 53.99%      3.40s  9.38%  bytes.(*Buffer).WriteString
     1.36s  3.75% 57.74%         2s  5.52%  bytes.(*Buffer).WriteByte
     1.34s  3.70% 61.43%      1.34s  3.70%  bytes.(*Buffer).tryGrowByReslice (inline)
     1.32s  3.64% 65.08%      2.31s  6.37%  bytes.(*Buffer).Write

It's been optimized to a point where bytes.Buffer, runtime.memmove and strconv all appear in the top 10, so it's pretty well optimized. I can't see any more low-hanging fruit in that top 10.

And here are the as reported by go test -run=- -bench=CodeDecoder -benchtime=5s -cpuprofile=cpu.out:

      flat  flat%   sum%        cum   cum%
    3190ms 10.74% 10.74%     3240ms 10.91%  encoding/json.stateInString
    3150ms 10.60% 21.34%     8260ms 27.80%  encoding/json.(*Decoder).readValue
    2990ms 10.06% 31.40%     7660ms 25.78%  encoding/json.(*decodeState).scanWhile
    1990ms  6.70% 38.10%    21120ms 71.09%  encoding/json.(*decodeState).object
    1730ms  5.82% 43.92%     1860ms  6.26%  encoding/json.stateEndValue
    1550ms  5.22% 49.14%     2110ms  7.10%  encoding/json.state1
    1350ms  4.54% 53.69%     1350ms  4.54%  encoding/json.unquoteBytes
    1020ms  3.43% 57.12%     1080ms  3.64%  encoding/json.stateBeginValue
     920ms  3.10% 60.22%     3120ms 10.50%  encoding/json.indirect
     720ms  2.42% 62.64%      740ms  2.49%  encoding/json.stateBeginString

The big difference here is that all 10 functions are in the json package itself. In particular, the scanner's state* functions take up half of the list, including number one. And numbers two and three, readValue and scanWhile, are also part of scanning tokens.

There's another issue about speeding up the decoder, #16212. It's about doing reflect work ahread of time, like the encoder does. However, out of that top 10, only object and indirect involve any reflect work, so I doubt that a refactor for #16212 would bring substantial gains while the scanner takes vastly more time than reflection.

So I propose that we refactor or even redesign the scanner to make it faster. The main bottleneck at the moment is the step function, which must be called for every decoded byte. I can imagine that it's slow for a few main reasons:

  • step is a function value, which can be nil - one nil check per byte
  • We can't quickly skip chunks of uninteresting bytes, such as whitespace or long quoted strings with safe characters
  • The calls cannot be inlined, even though many state* functions are very small - one call per byte

The godoc for the step field reads:

// The step is a func to be called to execute the next transition.
// Also tried using an integer constant and a single func
// with a switch, but using the func directly was 10% faster
// on a 64-bit Mac Mini, and it's nicer to read.

I tried using a function switch, and even a function jump table, but neither gave noticeable speed-ups. In fact, they all made CodeDecoder 1-5% slower. I presume the manual jump table via a [...]func(*scanner, byte) int array is slow because we still can't get rid of the nil checks. I'd hope that #5496 would help here.

I have a few ideas in mind to try to make the work per byte a bit faster, but I'm limited by the "one unknown step call per byte" design, as I tried to explain above. I understand that the current design makes the JSON scanner simpler to reason about, but I wonder if there's anything we can do to make it faster while keeping it reasonably simple.

This issue is to track small speed-ups in the scanner, but also to discuss larger refactors to it. I'm milestoning for 1.13, since at least the small speed-ups can be merged for that release.

/cc @rsc @dsnet @bradfitz @josharian @kevinburke

@gopherbot
Copy link

Change https://golang.org/cl/150919 mentions this issue: encoding/json: use uint offsets in the decoder

@mvdan
Copy link
Member Author

mvdan commented Nov 22, 2018

Here's a potential idea. Quoted strings are very common in JSON, so we could make the scanner tokenise those with specialised code. Only a few states are involved in strings, and they're hot paths - for example, stateInString and stateBeginString both appear in the top 10 above. I think this would give a very noticeable speed-up, since we could avoid nil checks and inline much of the work.

Of course, the disadvantage is that we break with the "one step call == one byte advanced" rule. I think we need to break that rule, if we hope to make the scanner significantly faster.

@mvdan
Copy link
Member Author

mvdan commented Nov 23, 2018

I had a realisation yesterday - the decoder scans the input bytes twice, both in Decoder.Decode and in Unmarshal. This accentuates the scanner as a bottleneck, since it's run twice on the input bytes.

On the bright side, this means we can introduce cheap speed-ups without redesigning the scanner. The tokenization happens during the second scan, at which point there can't be any syntax errors. https://go-review.googlesource.com/c/go/+/151042 uses this for a shortcut, since we can iterate over a string literal's bytes in a much faster way. Forty lines of code gives a ~7% speed-up, which I think is worth it.

@gopherbot
Copy link

Change https://golang.org/cl/151042 mentions this issue: encoding/json: speed up string tokenization

@dgryski
Copy link
Contributor

dgryski commented Nov 23, 2018

The new decoder will need to be fuzzed.

@mvdan
Copy link
Member Author

mvdan commented Nov 23, 2018

Agreed. I don't think anyone has been fuzzing the encoding/json package lately. I can take a look at that during the 1.13 cycle, once I'm done with this batch of decoding optimizations.

@gopherbot
Copy link

Change https://golang.org/cl/151157 mentions this issue: encoding/json: avoid work when unquoting strings

@mvdan
Copy link
Member Author

mvdan commented Nov 25, 2018

Here are the final numbers after the three CLs above:

name           old time/op    new time/op    delta
CodeDecoder-4    31.1ms ± 0%    26.3ms ± 0%  -15.58%  (p=0.002 n=6+6)

name           old speed      new speed      delta
CodeDecoder-4  62.3MB/s ± 0%  73.8MB/s ± 0%  +18.45%  (p=0.002 n=6+6)

I'll leave it here during the freeze - looking forward to feedback and reviews once the 1.13 tree opens. So far, I haven't redesigned or restructured the scanner, I've simply introduced shortcuts in the decoder.

If these kinds of optimizations are welcome, I might look into more of them for 1.13 once this first batch is reviewed.

gopherbot pushed a commit that referenced this issue Apr 3, 2019
Decoder.Decode and Unmarshal actually scan the input bytes twice - the
first time to check for syntax errors and the length of the value, and
the second to perform the decoding.

It's in the second scan that we actually tokenize the bytes. Since
syntax errors aren't a possibility, we can take shortcuts.

In particular, literals such as quoted strings are very common in JSON,
so we can avoid a lot of work by special casing them.

name                  old time/op    new time/op    delta
CodeDecoder-8           10.3ms ± 1%     9.1ms ± 0%  -11.89%  (p=0.002 n=6+6)
UnicodeDecoder-8         342ns ± 0%     283ns ± 0%  -17.25%  (p=0.000 n=6+5)
DecoderStream-8          239ns ± 0%     230ns ± 0%   -3.90%  (p=0.000 n=6+5)
CodeUnmarshal-8         11.0ms ± 0%     9.8ms ± 0%  -11.45%  (p=0.002 n=6+6)
CodeUnmarshalReuse-8    10.3ms ± 0%     9.0ms ± 0%  -12.72%  (p=0.004 n=5+6)
UnmarshalString-8        104ns ± 0%      92ns ± 0%  -11.35%  (p=0.002 n=6+6)
UnmarshalFloat64-8      93.2ns ± 0%    87.6ns ± 0%   -6.01%  (p=0.010 n=6+4)
UnmarshalInt64-8        74.5ns ± 0%    71.5ns ± 0%   -3.91%  (p=0.000 n=5+6)

name                  old speed      new speed      delta
CodeDecoder-8          189MB/s ± 1%   214MB/s ± 0%  +13.50%  (p=0.002 n=6+6)
UnicodeDecoder-8      40.9MB/s ± 0%  49.5MB/s ± 0%  +20.96%  (p=0.002 n=6+6)
CodeUnmarshal-8        176MB/s ± 0%   199MB/s ± 0%  +12.93%  (p=0.002 n=6+6)

Updates #28923.

Change-Id: I7a5e2aef51bd4ddf2004aad24210f6f50e01eaeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/151042
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/170939 mentions this issue: encoding/json: remove a bounds check in readValue

gopherbot pushed a commit that referenced this issue Apr 13, 2019
readValue is a hot function, clocking in at ~13% flat CPU use in
CodeDecoder. In particular, looping over the bytes is slow. That's
partially because the code contains a bounds check at the start of the
loop.

The source of the problem is that scanp is a signed integer, and comes
from a field, so the compiler doesn't know that it's non-negative. Help
it with a simple and comparatively cheap hint.

While at it, use scanp as the index variable directly, removing the need
for a duplicate index variable which is later added back into scanp.

name           old time/op    new time/op    delta
CodeDecoder-8    11.3ms ± 1%    11.2ms ± 1%  -0.98%  (p=0.000 n=9+9)

name           old speed      new speed      delta
CodeDecoder-8   172MB/s ± 1%   174MB/s ± 1%  +0.99%  (p=0.000 n=9+9)

Updates #28923.

Change-Id: I138f83babdf316fc97697cc18f595c3403c1ddb7
Reviewed-on: https://go-review.googlesource.com/c/go/+/170939
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
gopherbot pushed a commit that referenced this issue Apr 22, 2019
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.

Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.

While at it, we can remove unnecessary checks like len(s)<2 and
s[0]=='"'. Add a comment explaining why.

name           old time/op    new time/op    delta
CodeDecoder-8    11.2ms ± 0%    11.1ms ± 1%  -1.63%  (p=0.000 n=9+10)

name           old speed      new speed      delta
CodeDecoder-8   173MB/s ± 0%   175MB/s ± 1%  +1.66%  (p=0.000 n=9+10)

Updates #28923.

Change-Id: I2436a3a7f8148a2f7a6a4cdbd7dec6b32ef5e20c
Reviewed-on: https://go-review.googlesource.com/c/go/+/151157
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/172918 mentions this issue: encoding/json: index names for the struct decoder

gopherbot pushed a commit that referenced this issue Apr 23, 2019
In the common case, structs have a handful of fields and most inputs
match struct field names exactly.

The previous code would do a linear search over the fields, stopping at
the first exact match, and otherwise using the first case insensitive
match.

This is unfortunate, because it means that for the common case, we'd do
a linear search with bytes.Equal. Even for structs with only two or
three fields, that is pretty wasteful.

Worse even, up until the exact match was found via the linear search,
all previous fields would run their equalFold functions, which aren't
cheap even in the simple case.

Instead, cache a map along with the field list that indexes the fields
by their name. This way, a case sensitive field search doesn't involve a
linear search, nor does it involve any equalFold func calls.

This patch should also slightly speed up cases where there's a case
insensitive match but not a case sensitive one, as then we'd avoid
calling bytes.Equal on all the fields. Though that's not a common case,
and there are no benchmarks for it.

name           old time/op    new time/op    delta
CodeDecoder-8    11.0ms ± 0%    10.6ms ± 1%  -4.42%  (p=0.000 n=9+10)

name           old speed      new speed      delta
CodeDecoder-8   176MB/s ± 0%   184MB/s ± 1%  +4.62%  (p=0.000 n=9+10)

name           old alloc/op   new alloc/op   delta
CodeDecoder-8    2.28MB ± 0%    2.28MB ± 0%    ~     (p=0.725 n=10+10)

name           old allocs/op  new allocs/op  delta
CodeDecoder-8     76.9k ± 0%     76.9k ± 0%    ~     (all equal)

Updates #28923.

Change-Id: I9929c1f06c76505e5b96914199315dbdaae5dc76
Reviewed-on: https://go-review.googlesource.com/c/go/+/172918
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@gopherbot
Copy link

Change https://golang.org/cl/190659 mentions this issue: encoding/json: avoid work when unquoting strings, take 2

@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@smasher164 smasher164 modified the milestones: Backlog, Go1.14 Oct 11, 2019
gopherbot pushed a commit that referenced this issue Oct 31, 2019
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.

In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.

We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.

Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.

While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.

name           old time/op    new time/op    delta
CodeDecoder-8    10.6ms ± 2%    10.5ms ± 1%  -1.01%  (p=0.004 n=20+10)

name           old speed      new speed      delta
CodeDecoder-8   183MB/s ± 2%   185MB/s ± 1%  +1.02%  (p=0.003 n=20+10)

Updates #28923.

Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
@ianlancetaylor ianlancetaylor modified the milestones: Go1.14, Backlog Dec 5, 2019
@gopherbot
Copy link

Change https://go.dev/cl/437558 mentions this issue: encoding/json: remove the struct field decoding index

@mvdan
Copy link
Member Author

mvdan commented Apr 8, 2023

This issue has served its purpose. More performance work will happen for sure, but I don't think this issue needs to remain open.

@mvdan mvdan closed this as completed Apr 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants