Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: Encoder internally buffers full output #7872

Open
extemporalgenome opened this issue Apr 26, 2014 · 26 comments
Open

encoding/json: Encoder internally buffers full output #7872

extemporalgenome opened this issue Apr 26, 2014 · 26 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@extemporalgenome
Copy link
Contributor

What does 'go version' print?

go version devel +2f6b9f80be36 Fri Apr 25 09:46:07 2014 -0600 linux/amd64

What steps reproduce the problem?
If possible, include a link to a program on play.golang.org.

1. Use json.NewEncoder(writer).Encode(value)

What happened?

Each call to json.Encoder.Encode uses an internal bytes.Buffer to buffer all encoded
output prior to writing any of that output.

What should have happened instead?

Output should use little or no internal buffering. Encoder should be able to efficiently
encode (and incrementally output) very large inputs.
@bradfitz
Copy link
Contributor

Comment 1:

Yeah, suppose it could use a bufio.Writer instead of a bytes.Buffer.

Labels changed: added release-go1.4, repo-main.

Status changed to Accepted.

@gopherbot
Copy link

Comment 2 by nicolashillegeer:

> Yeah, suppose it could use a bufio.Writer instead of a bytes.Buffer.
This could indeed save a useless copy, as also noted in this issue:
https://golang.org/issue/5683
However, there are some small things. For example, I'm using an Encoder to encode
straight into a go.net websocket. Due to the current implementation, the Encoder issues
a Write once the entire message is completed. With a bufio.Writer, there could be
several writes. The go.net WebSocket implementation takes every write as an entire
message (as far as I can see from perusing the source). This would make my json possibly
arrive in arbitrary pieces.
Note that what I'm doing is not necessarily the appropriate usage of go.net WebSockets,
it has websocket.JSON.Receive/Send methods to send and receive entire messages at a
time. And I should be using that. But those methods use json.Marshal/json.Unmarshal
under the hood, which cause buffer churn when used a lot (my use case is relatively
high-throughput). Ideally the go.net WebSockets package would use Encoder/Decoder with a
bytes.Buffer under the hood for the JSON.(Send|Receive) calls, reusing the buffer every
time (either using a sync.Pool for the buffers or storing them in the websocket
connection itself, since I believe those don't support concurrent writing anyway).
I would not mourn too much if my current usage of go.net WebSockets stops working.

@rsc
Copy link
Contributor

rsc commented Sep 15, 2014

Comment 3:

I don't believe we can change this today. The buffering ensures that if an error is
encountered, no partial JSON is emitted. The only thing we could do is add some method
to encoder to allow people to turn it off themselves.

Labels changed: added release-go1.5, removed release-go1.4.

@slimsag
Copy link

slimsag commented Dec 22, 2014

I was rather shocked when I discovered this, actually. If my understanding is correct this means that JSON cannot be streamed to a file without it consuming large amounts of memory.

The only thing we could do is add some method to encoder to allow people to turn it off themselves.

I highly agree with this approach.

@bradfitz
Copy link
Contributor

@slimsag, I don't understand. Each call to encoding/json.Encoder.Encode produces one Write. Even if Encode wrote more eagerly to its Writer, you still have the entire argument to Encode (the whatever interface{}) in memory. So it's at worst a linear factor of memory. This isn't the difference of being able to stream or not.

Realistically if you want to stream JSON, you'd do something like:

     e := json.NewEncoder(f)
     io.WriteString(f, "[")
     for .... {
           dataToStream := ....
           e.Encode(dataToStream)
           io.WriteString(f, ",")
     }
     io.WriteString(f, "]")

... and that pattern works fine even today.

Changing the internal buffering (this bug) has nothing to do with making it possible to programmatically generate the data to stream. That would a much more invasive API change, and not obviously better than the loop above.

ydnar added a commit to ydnar/go that referenced this issue Jan 3, 2015
@ydnar
Copy link

ydnar commented Jan 3, 2015

This should satisfy the existing requirements (truncation on error), and limited buffering to enable streaming structures of unbounded/unknown size: https://github.com/ydnar/go/compare/jsonio

@slimsag
Copy link

slimsag commented Jan 3, 2015

@bradfitz I can see now that I misunderstood what this bug is about, my apologies!

From the title "Encoder internally buffers full output" I had incorrectly thought all calls to Encode were buffered. I have re-read the issue understand now that only the data written by a single call to Encode is buffered, and so streaming works fine.

@rsc rsc removed accepted labels Apr 14, 2015
@bradfitz bradfitz modified the milestones: Go1.6, Go1.5 Jul 7, 2015
@gopherbot
Copy link

CL https://golang.org/cl/13818 mentions this issue.

@freeformz
Copy link

I took a stab at this: https://golang.org/cl/13818

I don't think I'm really happy with the outcome though.

Review / comments welcome.

PS: This is my first attempt to contribute.

@cespare
Copy link
Contributor

cespare commented Apr 14, 2016

Based on what @rsc wrote in https://golang.org/cl/13818, what we want to do here is to provide an opt-in mechanism for an Encoder to generate output incrementally.

The corresponding change for a Decoder is #11046.

@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016
@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 6, 2016
@rsc rsc added this to the Go1.9 milestone Oct 26, 2016
@rsc rsc removed this from the Go1.8 milestone Oct 26, 2016
@bradfitz
Copy link
Contributor

bradfitz commented May 9, 2017

Too late. Didn't happen for Go 1.9.

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9 May 9, 2017
@imjasonh
Copy link

Would an API like json.NewEncoder(w).Buffer(numBytes).Encode(foo) be acceptable? Buffer (name TBD obviously) would tell the encoder to flush to w every numBytes bytes. The default behavior would be unchanged.

Buffer would return an Encoder to enable method chaining. Encoder's current SetIndent method doesn't facilitate chaining, but it could. I don't know if it was an explicit decision in the design not to facilitate this.

A similar API change could be made for json.NewDecoder(r).Buffer(numBytes).Decode(&foo).

@andybons
Copy link
Member

@rsc What do you think of Jason’s approach?

@andybons
Copy link
Member

@ianlancetaylor for further triage/thoughts.

@ianlancetaylor
Copy link
Contributor

Related to #20169.

@ianlancetaylor
Copy link
Contributor

Apart from the issues discussed in #20169 I don't see a reason to set a number of bytes to accumulate. I would just have an option along the lines of SetEscapeHTML that directs the Encoder to stream output data rather than accumulate it. Then instead of writing to a bytes.Buffer, we write directly to the writer. If the caller wants to buffer at a certain size, they can use the bufio package themselves.

Note that the SetIndent support would require a streaming version of Indent, which looks feasible to me.

@imjasonh
Copy link

Do you have opinions about enabling SetIndent, SetEscapeHTML and the future SetBuffer to be chained?

enc := json.NewEncoder(out)
enc.SetEscapeHTML(true)
enc.SetBuffer(2<<20)
_ = enc.Encode(foo)

vs

_ = json.NewEncoder(out).
    SetEscapeHTML(true).
    SetBuffer(2<<20).
    Encode(foo)

Do any backward-compatibility concerns arise from changing SetIndent to return a *Encoder?

@rsc
Copy link
Contributor

rsc commented Oct 20, 2017

There's no chaining today; please don't add any.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018
@xeoncross
Copy link

The buffering ensures that if an error is encountered, no partial JSON is emitted.

I have seen multiple developers guarding against partial JSON by using a buffer pool. I myself wasn't aware Go did this. Perhaps the documentation could be updated to mention that on error, no partial encoding is written.

Other languages have made us really defensive programmers, and It's really nice to see the forethought that Go has put into it's implementations (while not becoming overly abstracted and superfluous).

@tv42
Copy link

tv42 commented Jul 31, 2018

Note for anyone who thinks they don't have to defend against partial writes with JSON responses: if you set Content-Type: application/json, you've already committed to success and returning JSON.

@gopherbot
Copy link

Change https://golang.org/cl/135595 mentions this issue: encoding/json: add SetBufferSize to Encoder to limit memory usage

@imjasonh
Copy link

There's no chaining today; please don't add any.

@rsc: Can you elaborate on why allowing chaining isn't preferred?

It can be nice for scoping down variables into only the scope where they're needed:

if err := json.NewEncoder(r).SetIndent("", "  ").Encode(i); err != nil {
  // uh oh
}

or

if err := json.NewEncoder(r).
    SetIndent("", "  ").
    Encode(i); err != nil {
  // uh oh
}

vs.

e := json.NewEncoder(r)
e.SetIndent("", "  ")
if err := e.Encode(i); err != nil {
  // uh oh
}
// e is in scope out here now :(

Chaining can also preserve vertical space in exchange for horizontal space, which most displays have more of anyway.

I agree it can be abused and taken to an unhelpful extreme, but for simple/common cases it seems worthwhile, backward-compatible, and fairly straightforward to implement.

@xeoncross
Copy link

I believe one of the complaints is that method chaining adds cognitive complexity by combining multiple differing transforms, actions, or side effects on the same line.

In your example, you can also use block scoping.

{
	e := json.NewEncoder(r)
	e.SetIndent("", "  ")
	if err := e.Encode(i); err != nil {
	  // uh oh
	}
}

// e is undefined

However, this is not normally an issue as both the compiler and tooling should yell at you.

@imjasonh
Copy link

@xeoncross That's a reasonable stance. If that's too much cognitive overhead for you (or your code reviewer), you can always just not chain the methods -- the existing behavior would work the same whether or not SetIndent returned its e.

If the stdlib enabled chaining, users who want it could use it in the scenarios where it makes sense to them. Users who don't want to use it, don't have to. 🤷‍♂

@flimzy
Copy link
Contributor

flimzy commented Sep 20, 2019

I've been looking closely at https://golang.org/cl/135595, and found some drawbacks. It has one confirmed bug (easily fixed), and one likely bug (unconfirmed), and introduces a drastic increase in allocs, and slows throughput.

I think all of these problems are easily fixed, and I'd be happy to contribute a CL, but the existing CL has some outstanding discussion points (not least of which lead to #27735), but seems to have died. Would an additional CL be welcome at this point? Or should the existing CL be updated with the necessary fixes (which I can elaborate on)?

@dsnet
Copy link
Member

dsnet commented Oct 6, 2023

Hi all, we kicked off a discussion for a possible "encoding/json/v2" package that addresses the spirit of this proposal.
The prototype v2 implementation is truly streaming when writing to an io.Writer or reading from an io.Reader.
See https://github.com/go-json-experiment/jsonbench#streaming for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests