encoding/json: add Encoder.EncodeToken method #40127

rogpeppe · 2020-07-09T07:52:47Z

Currently there is a way to decode JSON token-by-token, but the JSON package does not support encoding tokens that way. It has been stated that just writing the tokens yourself is straightforward enough to do (citation needed), but it's not actually that easy to do.

Here's some code that streams JSON from input to output: https://play.golang.org/p/H6Xl_twRIyC. It's at least 50 lines of code to do the basic streaming functionality, it's easy to get wrong by forgetting to put a colon or a comma in the right place, and if you want indentation or HTML escaping, you have to do it yourself.

I propose that we add an EncodeToken method to json.Encoder:

// EncodeToken encodes a single JSON token to the encoder. The order of
// tokens must result in valid JSON syntax (as returned by the
// Decoder.Token method). That is, delimiters [ ] { } must be properly
// nested and matched, and object keys must be strings (but see the note
// on using Encode below).
//
// EncodeToken returns an error if the token cannot be encoded or is
// misplaced, for example because there's a delimiter mismatch, a
// non-string token is passed where an object key is expected, a string
// contains invalid UTF-8 or a Number isn’t formatted correctly.
//
// Note that it’s OK to mix calls to EncodeToken with calls to Encode.
// In particular, it’s also valid to call Encode when an object key is
// expected - in this case, the value is subject to the same conversion
// rules as for map keys.
func (enc *Encoder) EncodeToken(tok Token) error

There would be no way to produce syntactically invalid output (other than truncating the output by not completing the encoded value). The Encoder would keep track of the current state in a stack internally.

An example: if we wanted to stream a large array of JSON objects, we could do:

enc.EncodeToken(json.Delim('['))
for {
    enc.Encode(arrayItem)
}
enc.EncodeToken(json.Delim(']'))

A slightly more comprehensive example is here

The code to stream a set of JSON values would become considerably simpler: https://play.golang.org/p/Wec5wepCYbE

Completeness validation

It might be useful to provide callers with a final check that their encoded value is in fact complete (that is, no final closing tokens have been omitted). To do that, the following method could be provided:

// ClosingToken returns the token necessary to close the current encoding object or array.
// It returns either Delim('}'), Delim(']'), or nil if there is no current object or array.
func (enc *Encoder) ClosingToken() Token

This method makes it straightforward to check that the object is complete (if env.ClosingToken == nil), but also could be used for sanity checks when developing, better errors, or to implement a Flush method that automatically closes all open objects and arrays:

for enc.ClosingToken() != nil {
   enc.EncodeToken(enc.ClosingToken())
}

Discussion

As with the current Decoder.Token implementation, Encoder.EncodeToken will not be particularly efficient, as it requires passing strings and numbers as strings inside an interface value. I think that this can be considered an orthogonal issue: a solution to the Decoder.Token garbage issue may also be applied to Decoder.Token. The symmetry between Decoder.Token and Encoder.EncodeToken is a useful property which makes this API easier to explain and use, and shouldn’t be discarded lightly, in my view.

Once Encoder.Encode is invoked, there’s no way to return to streaming mode, for example to encode a large array within a MarshalJSON method. This is a limitation of the json.Marshaler interface, and could potentially be addressed by a fix in that area.

It would almost be possible to implement this without adding a method at all by changing the Encode method to recognize Delim. However, it’s currently valid to pass a Delim value to encode (it encodes as a number), so this would break backward compatibility. Also, it’s arguably cleaner to have a separation between the low level token API and the higher level Encode API.

Related proposals

#33714

The text was updated successfully, but these errors were encountered:

rsc · 2020-10-28T17:40:33Z

EncodeToken seems like a nice analog of Decoder.Token.

I'm less convinced about ClosingToken. If the program doesn't know what syntax comes next, what business does it have encoding JSON token by token?

rogpeppe · 2020-10-29T08:24:05Z

Thanks for the review!

I'm less convinced about ClosingToken. If the program doesn't know what syntax comes next, what business does it have encoding JSON token by token?

I'm not deeply attached to ClosingToken either. I was trying to think through what might happen if this API is being used across modules. The program should always produce valid syntax, but perhaps some component has produced an invalid token and not checked the error. My thought it that it would be nice to be able to have some assurance that the JSON that has been encoded is valid without parsing the whole thing again. Initially I thought of just a Complete() bool method, but an error message produced because that returns false wouldn't be very informative - it would be better, I think, to say something about how it's incomplete, which was where the idea came from for ClosingToken which provides that functionality, would be trivial to implement and might possibly have other uses too.

dsnet · 2020-11-02T23:35:02Z

What's the expected flushing behavior for this feature? Do we expect that every call to Encoder.EncodeToken results in at least one Write call to the underlying io.Writer?

I see several reasonable semantics:

Every call to Encoder.EncodeToken results in the data immediately written to the underlying io.Writer. However, there would be significant performance implications for this semantic.
Encoder.EncodeToken only flushes to the underlying io.Writer whenever the internal buffer gets sufficiently full. In order to ensure the data is fully flushed, we provide an Encoder.Flush method.
Encoder.EncodeToken flushes to the underlying io.Writer whenever the internal buffer gets sufficiently full OR a top-level JSON value has been completely encoded. For example, encoding the following sequence of tokens: [, "foo", 1, null, ] would only flush with the final ], which marks the end of the top-level JSON array.

In my opinion, I think option 3 is the most reasonable. It requires no new API, and has decent performance. Tracking whether the top-level value is complete is logic the Encoder has to do anyways.

ChrisHines · 2020-11-02T23:52:18Z

I also vote for option 3. For structured logging where each log record is typically a top-level JSON object flushing on each object has nice properties.

rogpeppe · 2020-11-03T08:12:16Z

I also agree that option 3 seems like the right approach, although it might also be useful to provide an explicit Flush method.

rsc · 2020-11-04T18:16:49Z

It seems like people are generally in favor of this (if accepted, it would be for Go 1.17). Do I have that right?

dsnet · 2020-11-11T00:28:00Z

I'm okay with this API, but I'd like to sanity check the behavior for Numbers.

The documentation currently says:

EncodeToken returns an error if ... a Number isn’t formatted correctly.

Since there are no Number constructors in the json package, it leaves formatting the number up to the user. The most obvious functionality to use would be strconv.FormatFloat. I think it is important property that if strconv.FormatFloat is used, the formatted number be guaranteed to be a valid JSON number.

It seems to me that:

The e, E, f, g, and G format flags will always produce output that is a valid JSON number. (Technically, Go's grammar for a floating-point number is more expressive than JSON's grammar, but it seems that strconv.FormatFloat is implemented in a way that is compatible; e.g. no leading zeros and at least one digit after the decimal point).
The only exceptions are for NaN and Infinity, but I think it's acceptable that those values result in a encode error as they already do (i.e., json.Marshal(math.NaN()) reports an error today).

Is that correct?

rogpeppe · 2020-11-11T09:00:01Z

@dsnet That's a good call. I think you're right, but it would be good to have definite assurance here (and we could mention FormatFloat in the docs too).

rsc · 2020-11-11T18:30:27Z

Yes, strconv.FormatFloat produces valid JSON for finite numbers, except perhaps if you use prec=0 with 'f'.

rsc · 2020-11-11T18:30:39Z

Based on the discussion above, this seems like a likely accept.

rsc · 2020-11-18T18:02:18Z

No change in consensus, so accepted.

dividinglimits · 2021-02-05T10:22:23Z

Is there an appetite to change the function name to prevent stutter? I see that Encoder.Token() to maintain consistency with Decoder.Token() is not possible as it is already already defined and exported. I do not have a suggestion for a contextually appropriate alternative at this time.

gopherbot · 2022-09-12T07:11:12Z

Change https://go.dev/cl/430235 mentions this issue: encoding/json: add Encoder.EncodeToken method

dsnet · 2023-10-06T05:58:26Z

Hi all, we kicked off a discussion for a possible "encoding/json/v2" package that addresses the spirit of this proposal.
See the "Encoder and Decoder" section of the discussion. The Encoder.WriteToken method implements what this proposal is asking for.

gopherbot added this to the Proposal milestone Jul 9, 2020

gopherbot added the Proposal label Jul 9, 2020

rogpeppe mentioned this issue Jul 9, 2020

proposal: encoding/json: garbage-free reading of tokens #40128

Open

ianlancetaylor added this to Incoming in Proposals (old) Jul 9, 2020

rsc moved this from Incoming to Active in Proposals (old) Oct 28, 2020

rsc mentioned this issue Oct 29, 2020

proposal: review meeting minutes #33502

Open

rsc moved this from Active to Likely Accept in Proposals (old) Nov 11, 2020

rsc added the Proposal-FinalCommentPeriod label Nov 11, 2020

rsc moved this from Likely Accept to Accepted in Proposals (old) Nov 19, 2020

rsc changed the title ~~proposal: encoding/json: add Encoder.EncodeToken method~~ encoding/json: add Encoder.EncodeToken method Nov 19, 2020

rsc added the Proposal-Accepted label Nov 19, 2020

rsc modified the milestones: Proposal, Backlog Nov 19, 2020

rsc removed the Proposal-FinalCommentPeriod label Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding/json: add Encoder.EncodeToken method #40127

encoding/json: add Encoder.EncodeToken method #40127

rogpeppe commented Jul 9, 2020

rsc commented Oct 28, 2020

rogpeppe commented Oct 29, 2020

dsnet commented Nov 2, 2020

ChrisHines commented Nov 2, 2020

rogpeppe commented Nov 3, 2020

rsc commented Nov 4, 2020

dsnet commented Nov 11, 2020

rogpeppe commented Nov 11, 2020

rsc commented Nov 11, 2020

rsc commented Nov 11, 2020

rsc commented Nov 18, 2020

dividinglimits commented Feb 5, 2021

gopherbot commented Sep 12, 2022

dsnet commented Oct 6, 2023 •

edited

encoding/json: add Encoder.EncodeToken method #40127

encoding/json: add Encoder.EncodeToken method #40127

Comments

rogpeppe commented Jul 9, 2020

Completeness validation

Discussion

Related proposals

rsc commented Oct 28, 2020

rogpeppe commented Oct 29, 2020

dsnet commented Nov 2, 2020

ChrisHines commented Nov 2, 2020

rogpeppe commented Nov 3, 2020

rsc commented Nov 4, 2020

dsnet commented Nov 11, 2020

rogpeppe commented Nov 11, 2020

rsc commented Nov 11, 2020

rsc commented Nov 11, 2020

rsc commented Nov 18, 2020

dividinglimits commented Feb 5, 2021

gopherbot commented Sep 12, 2022

dsnet commented Oct 6, 2023 • edited

dsnet commented Oct 6, 2023 •

edited