Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/json: reject unknown fields in Decoder #15314

Closed
okdave opened this issue Apr 15, 2016 · 36 comments
Closed

proposal: encoding/json: reject unknown fields in Decoder #15314

okdave opened this issue Apr 15, 2016 · 36 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge Proposal-Accepted
Milestone

Comments

@okdave
Copy link
Contributor

okdave commented Apr 15, 2016

Currently, json.Unmarshal (and json.Decoder.Decode) ignore fields in the incoming JSON which are absent in a target struct.

For example, the JSON { "A": true, "B": 123 } will successfully unmarshal into struct { A bool }.

This makes it difficult to do strict JSON parsing. One place this is an issue is in API creation, since you want to reject unknown fields on incoming JSON requests to allow safe introduction of new fields in the future (I've had this use case previously when working on a JSON REST API). It also can lead to typos leading to bugs in production: when a field is almost always the zero value in incoming JSON, you might not realise you're not even reading a misspelled field.

I propose that a new method is added to Decoder to turn on this stricter parsing, in much the same way UseNumber is used today. When in strict parsing mode, a key in the incoming JSON which cannot be applied to a struct will result in an MissingFieldError error. Like UnmarshalTypeError, decoding would continue for the remaining incoming data.

d := json.NewDecoder(r.Body)
d.UseStrictFields()
err := d.Decode(myStruct)
@bradfitz
Copy link
Contributor

SGTM. @rsc, @adg?

@extemporalgenome
Copy link
Contributor

Could we add a mechanism to the json package that allows a struct to declare that it should have strict or non-strict handling, independent of how the decoder is configured? This could be an interface, or a pair of zero-size types (or a bool type) that could be embedded -- in much the same way that the xml package provides configurability -- or perhaps a json struct tag that could be recognized on any padding field.

There are cases where it's desirable for some json objects in a "document" to be treated strictly, whereas others in the same document should be non-strict (yet still warrant use of a struct instead of a map). Allowing an "overflow" map field would make it easier to express this (if you include such a map field, and turn on strict handling, the containing struct would be non-strict).

@rsc
Copy link
Contributor

rsc commented Apr 15, 2016

This is a duplicate of #14750. Continuing @extemporalgenome's thought, I wonder if the strictness should always be in the tag instead of a new method. Per-struct is difficult but per-field is fine. You could imagine json:"Foo,exactname".

@bradfitz
Copy link
Contributor

@rsc, partial duplicate. This bug is about undeclared fields. #14750 is about differing case.

@rsc rsc changed the title proposal: add strict missing field checks to encoding/json.Decoder proposal: some way to reject unknown fields in encoding/json.Decoder Apr 15, 2016
@rsc
Copy link
Contributor

rsc commented Apr 15, 2016

Ah, thanks very much. OK, then UseStrictFields is definitely a bad name. :-)

We could certainly solve Dave's problem but I wonder if we should do something a bit more generic. In the XML decoder there is a ",any" tag that you can use to say "unexpected things go here". I wonder if JSON should allow a map[K]V (K=string, V=interface{} would be common but not strictly required) to have a tag json:",any" to collect otherwise ignored key-value pairs. Then the decoder can do whatever it needs to do with them. Similarly on encoding those would have to be reinserted into the marshalled object.

It might be too late for Go 1.7 to work all this out.

@bradfitz bradfitz added this to the Unplanned milestone Apr 15, 2016
@cespare
Copy link
Contributor

cespare commented Apr 16, 2016

The "overflow" map discussed by @extemporalgenome and @rsc has been requested a few times. In fact, #6213 already exists for such a proposal. (The confusing name "inline" for the feature makes it hard to discover.) Perhaps this bug should supersede #6213.

@cespare
Copy link
Contributor

cespare commented Apr 16, 2016

I like the overflow map as a solution to the extra keys problem.

I think the dual of this issue is also important: sometimes you want to know that a key was missing from the JSON object. (In the case of a misspelling as in @okdave's motivation, you could detect either the missing key or the extra key.)

Adding a way to error on missing JSON keys is tracked by #6901. I believe these should be considered at the same time.

@okdave
Copy link
Contributor Author

okdave commented Apr 16, 2016

An overflow map would work for this purpose. One downside to adding that as a struct field would be that for deeply nested structures, you'd need to add the field to multiple structs and do a lot of manual bookkeeping after unmarshalling. One big advantage is no (terribly named) new method on Decoder and that it works with Unmarshal too.

Would that same map be used to add additional keys during Marshal? Or ignored there?

@extemporalgenome
Copy link
Contributor

@okdave users would likely expect any Marshal(Unmarshal()) to produce output equivalent to the input.

@okdave
Copy link
Contributor Author

okdave commented Apr 19, 2016

Yup, I agree. @rsc I wasn't aiming for 1.7, but I'm happy to jump on this if there's consensus on this approach.

@dvyukov
Copy link
Member

dvyukov commented Jul 1, 2016

I need this as well. When parsing configs any unknown field is either a typo, or an old version, or something equally bad.

    unknown map[string]interface{} // `json:",any"`

would work for me.

@dvyukov
Copy link
Member

dvyukov commented Jul 11, 2016

On second thought, any sink will work poorly with nested structs. One would need to add the sink to all structs and then manually check all of them. Obviously error prone.

@rogpeppe
Copy link
Contributor

I'm +1 on providing a way to do this.
Perhaps something like this:

// DisallowUnknownFields causes the Decoder to return an error
// if it tries to decode an object member into a struct that doesn't have
// a field with that member's name.
func (dec *Decoder) DisallowUnknownFields()

The implementation is trivial - I can propose it if this proposal is agreed on.

I see this as orthogonal to the UseStrictNames suggestion from #14750.

@rogpeppe
Copy link
Contributor

This issue seems to be an exact duplicate of #6901.

@okdave
Copy link
Contributor Author

okdave commented Aug 3, 2016

@rogpeppe I like DisallowUnknownFields, and prefer it to the overflow map for the reason that @dvyukov mentions above (nested structs).

@mspiegel
Copy link

+1 for DisallowUnknownFields(). The interface is similar to the existing UseNumber() method.

@mspiegel
Copy link

I have a patch that implements this feature. It is passing nearly all of the existing encoding/json unit tests except one test is failing. Wasn't sure how to get feedback on my patch? Do I submit it to gerrit even though it needs additional work? I've created a pull request from my fork against my fork so that the changes can be browsed: https://github.com/mspiegel/go/pull/1/files.

The failing test is related to a key that is being "annihilated", that's the terminology in the source code when two fields in a struct have the same JSON name.

@okdave
Copy link
Contributor Author

okdave commented Aug 17, 2016

If you have a change that's ready for people to provide feedback on, gerrit's the right place to have the discussion. If there's one thing you can't figure out, then just check the rest of the change is sane and send it out.

You should add a comment to the change once you've mailed it to indicate what help you still need, and add "DO NOT SUBMIT" on the second line of the commit message to indicate that the CL is currently unworkable. Per @rsc do not submit means "something you do want people to see and review but that you know is not ready".

Let me know if you need more info.

@gopherbot
Copy link

CL https://golang.org/cl/27231 mentions this issue.

@mspiegel
Copy link

Thanks for the initial feedback on gerrit it has been helpful. Can I convince a consensus to change the name of the new option from DisallowUnknownFields to DisallowUnusedKeys? The motivation to change 'Fields' to 'Keys' is that the new option generates an error based on ignored JSON keys not based on ignored struct fields. The motivation to change 'Unknown' to 'Unused' is more subjective: Unused is a simple definition of the JSON keys that generate an error. Unknown JSON keys are defined as those keys which are unused.

@okdave
Copy link
Contributor Author

okdave commented Aug 18, 2016

I prefer DisallowUnknownFields. For starters Unused means something subtly different to unknown: we don't really know whether someone is using a key or not. I think field is fine too: the target is a struct, and the field is unknown/missing on that struct.

@mspiegel
Copy link

OK works for me. I'll move forward on the patch.

@mspiegel
Copy link

mspiegel commented Sep 7, 2016

What happens next? The patch is complete and submitted to gerrit https://golang.org/cl/27231. It's waiting further review.

@rdtr
Copy link

rdtr commented Dec 27, 2016

@mspiegel I see, thank you for the clarification!

@lukescott
Copy link

I like DisallowUnknownFields when you want to have strict validation. It keeps the structs clean. However, I would also like to see a json:",any" option for cases where you want to collect fields not defined in the struct. In my case I have an object that has some well known fields, but I also want to keep user-defined data that is not well known.

IMO, the two cases are closely related and I would like to see them both addressed at the same time.

@mspiegel
Copy link

mspiegel commented Feb 1, 2017 via email

@lukescott
Copy link

Given that map[string]interface{} is used if you were to Unmarshal an object into interface{}, wouldn't requiring an map[string]interface{} type for ,any be sufficient? The purpose is to save the data instead of dropping it.

Even if you could support other types such as map[string]string, I'm not sure handling "what do you do with a non-string value" is a necessary pursuit.

What if ,any required map[string]interface{} and other types are revisited at a later time?

@dc0d
Copy link

dc0d commented May 16, 2017

There could be a special type json.Meta (which is a map[string]interface) that would get embedded and json would treat it properly:

type Data struct {
	Known int      `json:"known"`
	Parts []string `json:"parts"`
	Go    string   `json:"go"`
	Here  string   `json:"here"`

	json.Meta // rest would get marshal/unmarshal to this
}

It would be nice if this was possible for json.RawMessage too.

combor pushed a commit to alphagov/paas-compose-broker that referenced this issue May 26, 2017
We had several options how to implement that:

* one plan and user provided json to specify number of units: we dont
like this option
* multiple plans and we hardcode number of units based on plan name
which means we have to change broker code every time the plan changes :
nah...

* fork brokerAPI and modify the Catalog struct and do PR upstream: it's
hard to extend this in a generic way by using json.RawMessage. The name
of the field has to be known and speciefied as json unmarshaller option.
So we would have to specify `json:"units,omitempty"` json unmarshaler
option which is very compose specific.
There is a go lang issue opened to fix that:
golang/go#15314

* do not use upstream Catalog and implement our own struct. Not posible
- we have to return the type specified in the brokerapi interface to
implemet it:
https://github.com/pivotal-cf/brokerapi/blob/0d62e62ec041b8ff39c0e304f437b4eabb4175fa/service_broker.go#L10

* extend external Catalog struct to include units field - not possible.
same reason as above

I had to implement the sixth option. Now we have a new struct that
partialy copies upstream catalog structure but holds only fields (IDs
and Units ) that we care about. This together with upstream catalog is
wrrapped into on struct and stored in main broker structure so all
interface methods have access to it. There is also new catalog.Load method
that replaces catalog.New. It has two unmarshalling passes to fill
upstream catalog and our Units struct.
henrytk added a commit to alphagov/paas-compose-broker that referenced this issue Jun 9, 2017
We had several options how to implement that:

* one plan and user provided json to specify number of units: we dont
like this option
* multiple plans and we hardcode number of units based on plan name
which means we have to change broker code every time the plan changes :
nah...

* fork brokerAPI and modify the Catalog struct and do PR upstream: it's
hard to extend this in a generic way by using json.RawMessage. The name
of the field has to be known and speciefied as json unmarshaller option.
So we would have to specify `json:"units,omitempty"` json unmarshaler
option which is very compose specific.
There is a go lang issue opened to fix that:
golang/go#15314

* do not use upstream Catalog and implement our own struct. Not posible
- we have to return the type specified in the brokerapi interface to
implemet it:
https://github.com/pivotal-cf/brokerapi/blob/0d62e62ec041b8ff39c0e304f437b4eabb4175fa/service_broker.go#L10

* extend external Catalog struct to include units field - not possible.
same reason as above

I had to implement the sixth option. Now we have a new struct that
partialy copies upstream catalog structure but holds only fields (IDs
and Units ) that we care about. This together with upstream catalog is
wrrapped into on struct and stored in main broker structure so all
interface methods have access to it. There is also new catalog.Load method
that replaces catalog.New. It has two unmarshalling passes to fill
upstream catalog and our Units struct.
combor pushed a commit to alphagov/paas-compose-broker that referenced this issue Jun 9, 2017
We had several options how to implement that:

* one plan and user provided json to specify number of units: we don't
like this option
* multiple plans and we hardcode number of units based on plan name
which means we have to change broker code every time the plan changes :
nah...

* fork brokerAPI and modify the Catalog struct and do PR upstream: it's
hard to extend this in a generic way by using json.RawMessage. The name
of the field has to be known and specified as json unmarshaller option.
So we would have to specify `json:"units,omitempty"` json unmarshaler
option which is very compose specific.
There is a go lang issue opened to fix that:
golang/go#15314

* do not use upstream Catalog and implement our own struct. Not possible
- we have to return the type specified in the brokerapi interface to
implement it:
https://github.com/pivotal-cf/brokerapi/blob/0d62e62ec041b8ff39c0e304f437b4eabb4175fa/service_broker.go#L10

* extend external Catalog struct to include units field - not possible.
Same reason as above

I had to implement the sixth option. Now we have a new struct that
partially copies upstream catalog structure but holds only fields (IDs
and Units ) that we care about. This together with upstream catalog is
wrrapped into one struct and stored in main broker structure so all
interface methods have access to it. There is also new catalog.Load method
that replaces catalog.New. It has two unmarshalling passes to fill
upstream catalog and our Units struct.
@bradfitz bradfitz added early-in-cycle A change that should be done early in the 3 month dev cycle. and removed early-in-cycle A change that should be done early in the 3 month dev cycle. labels Jun 14, 2017
@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.10 Jun 14, 2017
@dsnet dsnet changed the title proposal: some way to reject unknown fields in encoding/json.Decoder proposal: encoding/json: reject unknown fields in Decoder Oct 11, 2017
@gopherbot
Copy link

Change https://golang.org/cl/74830 mentions this issue: encoding/json: disallow unknown fields in Decoder

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge Proposal-Accepted
Projects
None yet
Development

No branches or pull requests