Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/json: preserve unknown fields #22533

Open
ibrt opened this issue Nov 1, 2017 · 9 comments
Open

proposal: encoding/json: preserve unknown fields #22533

ibrt opened this issue Nov 1, 2017 · 9 comments

Comments

@ibrt
Copy link
Contributor

ibrt commented Nov 1, 2017

Yesterday I've implemented #15314, which allows to optionally fail JSON decoding if an object has a key which cannot be mapped to a field in the destination struct.

In the discussion of that proposal, a few people floated the idea of having a mechanism to collect such keys/values instead of silently ignoring them or failing to parse.

The main use case I can think of is allowing for JSON to be decoded into structs, modified, and serialized back while preserving unknown keys (modulo the order in which they appeared, and potentially "duplicate" keys that are dropped due to uppercase/lowercase collisions, etc.). This behavior is supported by many languages / libraries and other serialization systems such as protocol buffers.

I propose to add this type to the JSON package:

type UnknownFields map[string]interface{}

Users of the JSON package can then embed this type in structs for which they'd like to use the feature:

type Data struct {
  json.UnknownFields
  FirstField int
  SecondField string
}

On decoding, any object key/value which cannot be mapped to a field in the destination struct would be decoded and stored in UnknownFields. On encoding, any key present UnknownFields would be added to the serialized object.

I can think of a couple edge cases which are tricky, and I propose to resolve them as follows:

Nested structs

It's possible for nested structs to also declare UnknownFields. In such cases any UnknownFields in nested structs should be ignored, both when decoding and encoding. Pros: it is consistent with how we already flatten fields, and it's the only way to ensure decoding is unambiguous. Cons: keys that somehow were set to UnknownFields in a child struct would be ignored on encoding.

Key collisions

When encoding it's possible that a key in UnknownFields would collide with another field on the struct. In such cases the key in UnknownFields should be ignored. Pros: it is consistent with the behavior in absence of UnknownFields, seems generally less error prone, it cannot happen in a plain decode/edit/encode cycle, it's unambiguous. Cons: it can possibly lead to silently dropping some values.

PS: I'm happy to do the implementation should the proposal or some variation of it be approved.

@gopherbot gopherbot added this to the Proposal milestone Nov 1, 2017
@ibrt ibrt changed the title proposal: encoding/json: collect unknown fields proposal: encoding/json: preserve unknown fields Nov 1, 2017
@dsnet
Copy link
Member

dsnet commented Nov 1, 2017

  • When encoding, what happens when UnknownFields contains a key that is a duplicate of a field in the struct?
  • What if I want the unknown values as json.RawMessage instead of interface{}, which is fairly nasty to work with.

Stepping back a moment. What is the use case you have in mind? Proposals are far more effective if they start with concrete problems they are trying to solve.

@ibrt
Copy link
Contributor Author

ibrt commented Nov 1, 2017

Hi!

As for the first bullet point it is already described in the proposal (last paragraph: key collisions).

Regarding json.RawMessage it seems to me that if one wants to manipulate these fields they would have to deserialize them anyways, if one just wants to pass them through it doesn't make a practical difference, except that deserialization would be cheaper. If desired we could simply add a type RawUnknownFields map[string]json.RawMessage, or use a different method for specifying where to put the data (no strong opinion, I just want the feature).

Use case is also somewhat described in the proposal: decoding into a struct, editing, and then encoding back while preserving fields. It is a relatively common need in distributed systems where JSON is used for RPC. For context it can be interesting to read this issue protocolbuffers/protobuf#272 about protobuf removing support for preserving unknown fields and then adding it back.

@rsc
Copy link
Contributor

rsc commented Nov 6, 2017

I'd like to save this issue for a future rethink of all of encoding/json. It's important we don't keep adding piecemeal each new feature that seems useful by itself.

@pascaldekloe
Copy link
Contributor

It is quite common for JSON APIs and formats to mix defined and custom keys within a single object e.g. JWT claims.

Having a placeholder for unmapped fields per struct would allow for loose structure parsing.

What is the status on the rethinking? I'm offering an implementation to your likings, if there's any sympathy for the concept?

@mvdan
Copy link
Member

mvdan commented Jul 7, 2020

I'm late to the party, but isn't this proposal very close to #6213?

@mitar
Copy link
Contributor

mitar commented Nov 6, 2020

In #42417 I proposed to have the following syntax:

type Message struct {
  Name  string                 `json:"name"`
  Type  string                 `json:"type"`
  Skip  string                 `json:"-"`
  Extra map[string]interface{} `json:"+"`
}

So + would mark a field into which all extra/unexpected fields would go.

And for the question what to do about overlapping fields, we could have that specified with a flag, like:

Extra map[string]interface{} `json:"+,preferExtra"`

To have extra fields override any struct fields.

@dsnet
Copy link
Member

dsnet commented Nov 11, 2020

I agree with @mvdan that if this feature is provided, it should probably be implemented in terms of a more general feature (i.e., #6213) that permits splitting off part of a JSON object for a Go struct into some other data structure.

Assuming that #6213 exists, it may be conceivable that json provides some type that assists in this purpose. However, it's not clear what the type should be. Several reasonable candidates are:

  • RawMessage: This would store all of the unknown fields as a raw JSON object in the exact order they are encountered. This carries the greatest fidelity in preserving the original input and is the most performant.
  • map[string]RawMessage: This would store all unknown fields as a key-value map where the value is the raw JSON value of the unknown field. Ordering is not preserved.
  • map[string]interface{}: This would store all unknown fields as a key-value map where the value is a dynamic representation of the JSON value. I think this is the worst representation since it is lossy (e.g., precision may be lost for JSON numbers). However, it may be the nicest to interact with if users need to introspect the unknown fields.
  • some other representation.

Alternatively, if the feature provided by #6213 is sufficiently simple, we provide no helper types and leave it to the user to figure out how to preserve unknown fields.

Also, we need to consider how such a feature may interact with the Decoder.DisallowUnknownFields feature that already exists. If DisallowUnknownFields is specified, what is the expected behavior? Do we treat unknown fields as an error? Do we ignore the unknown fields since something in the object can store it?

@pascaldekloe
Copy link
Contributor

Raw implies unmodified, so we can't collect fields in raw.

The main reason for parsing extended fields together with an expected structure is performance. Using raw per field causes the opposite effect (with multiple parse operations).

Options relying on field order should be avoided. The specifications prohibit such logic explicitly.

@dsnet
Copy link
Member

dsnet commented Oct 6, 2023

Hi all, we kicked off a discussion for a possible "encoding/json/v2" package that addresses the spirit of this proposal.
See the "inline" and "unknown" struct tag options under the "Struct tag options" section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants