Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: allow non-string map keys that implement encoding.TextMarshaler/TextUnmarshaler #12146

Closed
augustoroman opened this issue Aug 14, 2015 · 32 comments

Comments

@augustoroman
Copy link
Contributor

Original discussion at https://groups.google.com/forum/#!searchin/golang-dev/json$20map/golang-dev/5gSHNrJQpUI/vZGSGRmUrC0J

Currently, json.Marshal will fail to marshal Go maps that have non-string keys, e.g.:

// http://play.golang.org/p/2m9wLZATqw
type Coord struct { X,Y int }
occupied := map[Coord]bool{}
occupied[Coord{1,2}] = true
data, err := json.Marshal(occupied)
fmt.Printf("Data: %s\nErr: %v", data, err)

I propose to enhance the encoding/json package such that:

  1. for json.Marshal
    • If the map key is a string kind it is used directly.
    • Otherwise if the map key satisfies the encoding.TextMarshaler interface then that is used to generate the map key.
    • Otherwise it fails as it does today.
  2. for json.Unmarshal
    • If the map key is a string kind it is written directly.
    • Otherwise if the map key satisfies the encoding.TextUnmarshaler interface then that is used to decode the map key.
    • Otherwise it fails as it does today.

Example

(http://play.golang.org/p/VxhFluFKTX)

This would, for example, allow a map[Coord]bool to be Marshaled & Unmarshaled if Coord was defined as:

type Coord struct{ X, Y int }
func (c Coord) MarshalText() ([]byte, error) {
    return []byte(fmt.Sprintf("%d,%d", c.X, c.Y)), nil
}
func (c *Coord) UnmarshalText(p []byte) error {
    if n, err := fmt.Sscanf(string(p), "%d,%d", &c.X, &c.Y); err != nil {
        return err
    } else if n != 2 {
        return errors.New("Cannot parse coord: " + string(p))
    }
    return nil
}

And the Go struct

map[Coord]bool{Coord{1, 2}: true, Coord{3, 4}: true}

would correspond to

{"1,2": true,"3,4":true}

Considerations

If the struct marshals to a non-unique value, e.g.

func (c Coord) MarshalText() ([]byte, error) {
    return []byte("always the same"), nil
}

The the json encoder would output JSON that has repeated keys, e.g.:

{"always the same": true,"always the same":true}

This is valid JSON.

Similarly, when decoding a map, it would unmarshal each key and value and then assign that into the map, so last-write-wins, as is currently done.

An interesting side effect is that it would then be possible to explicitly handle JSON with repeated keys (or even explicitly record the order of the json keys) by having the go-side map key TextUnmarshaler have non-deterministic unmarshaling (e.g. record timestamp for each unmarshal).

@mikioh mikioh changed the title Proposal: Allow json.Marshal / json.Unmarshal handle non-string map keys if they are encoding.TextMarshaler / encoding.TextUnmarshaler proposal: Allow json.Marshal / json.Unmarshal handle non-string map keys if they are encoding.TextMarshaler / encoding.TextUnmarshaler Aug 14, 2015
@stemar94
Copy link

I gave it a try. First hacked version (encode should be fine, but decode is ugly).
https://go-review.googlesource.com/#/c/14551/

@adg adg added Proposal and removed Proposal labels Sep 25, 2015
@extemporalgenome
Copy link
Contributor

The proposed "if string kind, it is used directly" behavior is the opposite of the json package's semantics; I suggest we reverse that so that it is:

  • for json.Marshal: if the map key implements json.Marshaler or encoding.TextMarshaler, they are used in that order of preference; otherwise, if the key type is of string kind, the map keys are used directly.
  • for json.Unmarshal: if the map key implements json.Unmarshaler or encoding.TextUnmarshaler, they are used in that order of preference; otherwise, if the key type is of string kind, the JSON object keys are stored directly.

@adg
Copy link
Contributor

adg commented Oct 12, 2015

I think this sounds reasonable to me. Interested to hear some other
opinions.

On 12 October 2015 at 05:57, Kevin Gillette notifications@github.com
wrote:

The proposed "if string kind, it is used directly" behavior is the
opposite of the json package's semantics; I suggest we reverse that so that
it is:

  • for json.Marshal: if the map key implements json.Marshaler or
    encoding.TextMarshaler, they are used in that order of preference;
    otherwise, if the key type is of string kind, the map values are used
    directly.
  • for json.Unmarshal: if the map key implements json.Unmarshaler or
    encoding.TextUnmarshaler, they are used in that order of preference;
    otherwise, if the key type is of string kind, the JSON object keys are
    stored directly.


Reply to this email directly or view it on GitHub
#12146 (comment).

@stemar94
Copy link

The proposed "if string kind, it is used directly" behavior is the opposite of the json package's semantics; I suggest we reverse that so that it is...

I would agree with that, but what about the go1 compatibility promise?

Rest is fine with me.

@extemporalgenome
Copy link
Contributor

@stemar94 I don't see how the compat guarantee applies here. As far as I
can tell, this proposal hasn't been merged into any release yet.
On Oct 13, 2015 4:33 AM, "stemar94" notifications@github.com wrote:

The proposed "if string kind, it is used directly" behavior is the
opposite of the json package's semantics; I suggest we reverse that so that
it is...

I would agree with that, but what about the go1 compatibility promise?

Rest is fine with me.


Reply to this email directly or view it on GitHub
#12146 (comment).

@augustoroman
Copy link
Contributor Author

@extemporalgenome Existing keys that have string kind but are also TextMarshalers would change behavior and possibly fail (depending on the marshaling)

E.g. http://play.golang.org/p/d4DBVsfeqJ

@extemporalgenome
Copy link
Contributor

@augustoroman, @stemar94 I see what you mean, and I now see why the proposal as originally specified is the way it is. Either way, it warrants good documentation.

@extemporalgenome
Copy link
Contributor

Are we also accounting for implementations of json.Marshaler and json.Unmarshaler (with the caveat that such implementations would be expected to en/decode JSON strings specifically)?

@augustoroman
Copy link
Contributor Author

No, the original proposal explicitly leaves json.Marshaler and json.Unmarshaler out. It seems ambiguous whether we would take the resulting json bytes and encode that as a string or check to see whether it corresponds to a json string, plus it seems somewhat inconsistent that we would allow some json.Marshalers but not others -- which may fail at runtime. (e.g. a given json.Marshaler could sometimes produce strings and sometimes not)

@extemporalgenome
Copy link
Contributor

@augustoroman json.Marshaler specifically generates encoded json (you wouldn't wrap its output further, but rather just pass it through after validating).

The JSON package already validates the output of json.Marshaler implementations:
http://play.golang.org/p/gLlXpswvV_

In this case, the validation would just ensure that it's a string, but otherwise pass it through.

I do agree that the "some but not others" point does make it less than desirable, but for the same reason the proposal uses string kinds directly (compatibility), if we're ever going to allow json.Marshaler's and json.Unmarshaler's here, we should do so now, since we'd have to completely invert the normal precedence to maintain compatibility if we add it later (string -> encoding.TextMarshaler -> json.Marshaler).

@augustoroman
Copy link
Contributor Author

@extemporalgenome I agree that adding it now is better than adding it later. I think we should not add it, however.

I don't see a need to allow json.Marshaler once we have encoding.TextMarshaler. That can be done with a trivial adapter, however the basic support for encoding.TextMarshaler is something that cannot be done with the current encoding package and thus warrants extending the package to add it now.

Also, the failure condition seems easier to understand:
"sorry, Foo in map[Foo]Bar is not a string nor an encoding.TextMarshaler"
rather than
"sorry, Foo in map[Foo]Bar is a json.Marshaler (and it encoded to valid JSON) but not a string."

On another note:
Despite writing the original proposal, I'm a bit bummed at the priority order even for string vs encoding.TextMarshaler -- I would much prefer the more consistent and rational order of using encoding.TextMarshaler first. I doubt that many actual programs would be affected, but there is a simple workaround (e.g. type Foo struct { X string }) and thus I lean towards the slightly convoluted API that maintains go1 compilation and functionality.

@extemporalgenome
Copy link
Contributor

@augustoroman that all sounds good. I withdraw the concerns I raised.

@nodirt
Copy link
Contributor

nodirt commented Oct 14, 2015

I think this is a fine proposal.

How json.Marshal/json.Unmarshal should behave in case of a flawed MashalText/UnmarshalText implementation that returns same text for different values?

func (c Coord) MarshalText() ([]byte, error) {
  return []byte(fmt.Sprintf("%s%s", c.X, c.Y)), nil
}

coords := []Coord {{11,2}, {1,12}}

IMO json.Unmarshal should just overwrite a value in a map. json.Marshal should generate duplicate keys because it won't make JSON invalid [1]

[1] https://tools.ietf.org/html/rfc7159#section-4. It says "The names within an object SHOULD be unique.", where "SHOULD" implies a recommendation, not a requirement.

@augustoroman
Copy link
Contributor Author

@nodirt Yes -- this is covered at the bottom of the original proposal (under "Considerations"). Interestingly, it won't "overwrite" the value in the map, it will merely output duplicate keys. When parsing such JSON, it's possible for UnmarshalText to generate unique keys each time, thereby making it possible to actually discover and handle such cases.

@nodirt
Copy link
Contributor

nodirt commented Oct 14, 2015

Right. Not sure how I missed that.

@cfchris
Copy link

cfchris commented Oct 17, 2015

This week I added 100+ lines of code to one of our packages to support MarshalJSON and UnmarshalJSON map[int]OurType. I also added 130+ lines of unit tests. While in the standard lib code, I could see it would be far fewer lines based on all of the unexported types and utility functions.

I would love to add this to the standard lib. What is the process for getting an issue approved and assigned?

@cfchris
Copy link

cfchris commented Oct 18, 2015

For reference, Chrome's JSON object exhibits the behavior that is being asked for.
e.g. (in console)

m = {};
m[22] = "foo";
m[33] = "bar";
JSON.stringify(m); // -> {"22":"foo","33":"bar"}
k = [1,2];
m[k] = "array";
JSON.stringify(m); // -> {"22":"foo","33":"bar","1,2":"array"}

@rsc rsc added this to the Proposal milestone Oct 24, 2015
@rsc
Copy link
Contributor

rsc commented Oct 24, 2015

This is a well-written proposal and is worth trying. Thank you.

However, the encoding/json package is suffering a bit from feature creep. I'd like to resolve what we're going to do about the omitempty/omitnil etc problems before we commit to adding even more features. So feel free to start a CL and see what unexpected issues you run into, but we may want to put off landing the CL until the Go 1.7 cycle. Depends on what comes up.

@cfchris, regarding comparison to Chrome: that's JavaScript, a completely dynamically typed language in which basically anything is allowed to happen at any time. There's not much that carries over to a statically typed language like Go. Also in the specific case of implicit conversions, JavaScript's decisions are fairly questionable. See https://www.destroyallsoftware.com/talks/wat for more.

@rsc rsc changed the title proposal: Allow json.Marshal / json.Unmarshal handle non-string map keys if they are encoding.TextMarshaler / encoding.TextUnmarshaler encoding/json: allow non-string map keys that implement encoding.TextMarshaler/TextUnmarshaler Oct 24, 2015
@rsc rsc modified the milestones: Go1.6, Proposal Oct 24, 2015
@augustoroman
Copy link
Contributor Author

@stemar94 I basically came up with the same change, except I didn't sort the keys when encoding -- good catch. The decoding seems fine to me. I'm going to extend your CL with documentation and tests.

The sorting may be problematic, IMO: Storing all of the stringified map keys in memory is painful, but at least we don't store all of the map values too. The JSON package doesn't mention the sorting anywhere, so in theory we don't have to respect that, however many people appear to rely on the generated JSON being stable, so it seems we shouldn't disrupt that lightly.

Anyone have opinions on how to handle sorting the keys? The issue is that the keys are currently assumed to be strings and sorted using reflect.Value.String(). For strings this returns the string, for anything else it returns the type, e.g. "<main.Foo Value>". Options are:

  1. Sort the keys by encoding all of the map keys up-front and then sorting the list of encoded strings.
  2. Don't change the current sorting, so that non-strings will be sorted using their (likely repeated) type as a string, and so generated output will probably differ from one encoding to the next for maps with more than 1 encoding.TextMarshaller.
  3. Stop sorting, so all encoding is different on each encoding.
  4. Check to see if the encoding type implements sort.Interface and use that if possible, otherwise fall back to pre-encoding the keys.

1 seems like the most reasonable option to me.

@extemporalgenome
Copy link
Contributor

@augustoroman since JSON objects are specifically unordered, why do we need to sort the keys?

@augustoroman
Copy link
Contributor Author

@extemporalgenome There's nothing that requires us to sort the keys: the JSON spec defines object keys as unordered (http://www.json.org/) and Go's hashmaps obviously don't maintain ordering.

The desire to sort the keys stems from maintaining consistency: the encoding/json package currently has the (undocumented) behavior that it always sorts the string keys lexicographically (e.g. http://stackoverflow.com/questions/18668652/how-to-produce-json-with-sorted-keys-in-go). With this, the encoded JSON is stable -- i.e. in tests you can compare against a fixed constant string to validate.

If we change this behavior for string keys, I think we will break a LOT of tests (and examples), so I would not want to do this without a very compelling reason.

We could consider option 2 where custom encoding.TextMarshaller would get sorted by type but within that type the order would be random -- that would only affect new code. But the behavior may be confusingly inconsistent.

@extemporalgenome
Copy link
Contributor

Arguably, existing tests wouldn't hit this code path, and new tests could have a better testing strategy, for example, instead of comparing the whole output against a known good value, each test could look in minimized JSON output for "<the-key>":<the-value> since that fragment of the JSON should be stable.

That said, if sorting is to occur, I imagine some people will want to do map[MyInt]string, and hope that keys will be sorted numerically. However, option 4 doesn't seem viable since element types can't meaningfully implement sort.Interface, only container types can (and there's no way to go searching at runtime for some T of underlying type []MyInt that also implements sort.Interface).

@augustoroman
Copy link
Contributor Author

I agree that tests probably shouldn't rely on sorting and I'd be comfortable forcing newly-written tests to adopt more granular testing (option 2). However, that also means that any examples that output JSON would be unstable and they would fail during tests, since the examples only allow using a string to compare the output to. That effectively means no examples that output JSON-encoded maps are viable.

I think that resolving the sorting some other way would be idea. Perhaps (in the future) we can have a "SortKeys(keys []reflect.Value)" or "SortedKeys() []interface{}" that the underlying map type may optionally implement to produce specifically sorted keys, and the encoding/json package will have an appropriate default for string keys.

@cfchris
Copy link

cfchris commented Nov 4, 2015

@augustoroman, I like option 1 for key sorting. I think that keys being in any type of numeric order is unnecessary. And although I agree that, in theory, JSON keys don't need to be sorted, having dependable output is handy for testing.

Where I work we have a mix of tests, some test against the whole expected value. And the other (better IMHO) ones test for fragments of the expected value.

@rsc
Copy link
Contributor

rsc commented Nov 5, 2015

Please keep sorting the keys. Thanks.

@rsc rsc modified the milestones: Unplanned, Go1.6 Nov 25, 2015
@rsc
Copy link
Contributor

rsc commented Nov 25, 2015

The proposal is accepted, but the work appears to have missed the cutoff for Go 1.6.

@cespare
Copy link
Contributor

cespare commented Feb 10, 2016

@stemar94 are you planning on submitting your CL for 1.7?

@stemar94
Copy link

@cespare I guess I could do that. But if there is someone more experienced with reflection or the with the json decoding and wants to implement this proposal, this would be fine with me as well.

@augustoroman
Copy link
Contributor Author

@stemar94 Thanks, I updated your CL with tests.

@stemar94
Copy link

stemar94 commented Mar 8, 2016

Can I abandon the CL, so you can be owner?
Changes look good, btw.

@augustoroman
Copy link
Contributor Author

Sure, I'll push a new CL, go ahead and abandon.

@gopherbot
Copy link

CL https://golang.org/cl/20356 mentions this issue.

@golang golang locked and limited conversation to collaborators Apr 5, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants