Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: Decoder internally buffers full input #11046

Open
kurin opened this issue Jun 3, 2015 · 9 comments
Open

encoding/json: Decoder internally buffers full input #11046

kurin opened this issue Jun 3, 2015 · 9 comments
Milestone

Comments

@kurin
Copy link

kurin commented Jun 3, 2015

When using the JSON package, if I encode a struct like

type Data struct {
    Count int
    Names []string
}

and then decode it into

type SmallData struct {
    Count int
}

it will still allocate memory for the list of names, even though it just gets thrown away. This becomes an annoyance when I have several multigigabyte JSON files like this. It would be neat if the JSON parser could identify what fields it cares about, or somehow be told what fields to ignore, and chuck them.

@bradfitz bradfitz added this to the Go1.6 milestone Jun 3, 2015
@bradfitz bradfitz changed the title Decoding JSON allocates memory for fields that aren't used. encoding/json: decoding allocates memory for fields that aren't used. Jun 3, 2015
@rsc
Copy link
Contributor

rsc commented Nov 5, 2015

I don't believe this is true. Specifically, I don't believe it allocates any memory for fields being discarded. If you think it does, please explain why you think that or point to the allocation. Thanks.

@kurin
Copy link
Author

kurin commented Nov 5, 2015

I wrote a small test that writes a json file of Data and then (as a new process) reads it into SmallData and prints runtime.MemStats.TotalAlloc: http://play.golang.org/p/5CB3FUL86m

I ran it with --make=10 to --make=1e8, stepping powers of ten.

The resulting plot is
here, which indicates that the larger the json file, the more memory is consumed reading into SmallData. It's not obvious, but each datapoint is actually a collection of 3 runs; the variance between runs was very small.

@ALTree
Copy link
Member

ALTree commented Nov 6, 2015

Pprof says 99% of bytes are allocated here.

@rsc
Copy link
Contributor

rsc commented Nov 25, 2015

The memory here is for holding the JSON input as read in from the file, not for decoding unused fields.

@rsc rsc changed the title encoding/json: decoding allocates memory for fields that aren't used. encoding/json: Decoder internally buffers full input Nov 25, 2015
@rsc rsc modified the milestones: Go1.7, Go1.6 Nov 25, 2015
@cespare
Copy link
Contributor

cespare commented Apr 13, 2016

The title says "Decoder internally buffers full input" but it might be better phrased as "Decoder buffers an entire value at a time".

We introduced Decoder.Token last cycle so it is technically possible for the user to use that and avoid buffering a whole value at once. Admittedly that would take a bunch of code.

It would also be possible for the decoder to stop decoding a whole value at once and instead read from the stream into the target structure incrementally. That would be a big refactoring of the decoder. Is that what this bug requires, or is there some simpler option I'm overlooking?

@rsc rsc modified the milestones: Unplanned, Go1.7 Apr 13, 2016
@rsc
Copy link
Contributor

rsc commented Apr 13, 2016

The new decoder.token should let people build incremental parsers customized to a particular use case. We cannot change the default behavior: right now if encoding/json consumes a very large but ultimately malformed JSON value, nothing is written to the destination. Incremental decoding would change those semantics by writing to the destination before realizing the value was malformed.

It might be possible to have a different opt-in mode in the Decoder, but certainly not at this point in the Go 1.7 cycle.

@cespare
Copy link
Contributor

cespare commented Apr 14, 2016

The corresponding change for an Encoder is #7872.

@ianlancetaylor
Copy link
Contributor

Related to #14140 which mentions some possible API changes.

@dsnet
Copy link
Member

dsnet commented Oct 6, 2023

Hi all, we kicked off a discussion for a possible "encoding/json/v2" package that addresses the spirit of this proposal.
The prototype v2 implementation is truly streaming when writing to an io.Writer or reading from an io.Reader.
See https://github.com/go-json-experiment/jsonbench#streaming for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants