Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: UnmarshalJSON does not respect escaped unicode characters #47353

Open
rittneje opened this issue Jul 22, 2021 · 9 comments
Open

time: UnmarshalJSON does not respect escaped unicode characters #47353

rittneje opened this issue Jul 22, 2021 · 9 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@rittneje
Copy link

What version of Go are you using (go version)?

1.16.6

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

The playground.

What did you do?

Unmarshaled a JSON string containing an escaped unicode character into a time.Time.

https://play.golang.org/p/MV6VeYcalrQ

What did you expect to see?

I expected it to parse correctly as per the JSON spec.

What did you see instead?

It did not understand the escaped character.

parsing time ""2009-11-10T23:00:00\u005a"" as ""2006-01-02T15:04:05Z07:00"": cannot parse "\u005a"" as "Z07:00"

@dsnet
Copy link
Member

dsnet commented Jul 23, 2021

This is unfortunate. In order to fix this, we either:

  1. implement JSON string decoding in the time package (a correct implementation is not trivial),
  2. have the time package depend on the json package, or
  3. move just the JSON string decoding logic from the json package into some internal package that both time and json can access.

@dsnet
Copy link
Member

dsnet commented Jul 23, 2021

As an aside: it was a mistake to implement UnmarshalJSON methods on time.Time, and there should only have been the UnmarshalText methods since those take in the unescaped string.

@neild
Copy link
Contributor

neild commented Jul 23, 2021

Would it be plausible for the JSON package to special-case time.Time, ignoring (time.Time).UnmarshalJSON in favor of a correct implementation?

This obviously doesn't scale to all types everywhere, but perhaps it would suffice for standard library types where we regret adding an UnmarshalJSON method.

@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 23, 2021
@dsnet
Copy link
Member

dsnet commented Jul 23, 2021

Would it be plausible for the JSON package to special-case time.Time, ignoring (time.Time).UnmarshalJSON in favor of a correct implementation?

That's another option. One detriment is that it fails to detect cases of embedding:

type MyTime struct { time.Time }

Here MyTime forwards all methods of time.Time and then probably adds a few more. From the perspective of the json package, it cannot accurately tell if MyTime is promoting time.Time.{Marshal,Unmarshal}JSON or not. The best it can do is check that MyTime has an anonymous field of type time.Time, but reflect provides no way to tell if the UnmarshalJSON method is explicitly declared on the MyTime type (because perhaps they really did want different formatting) or was forwarded from time.Time.

@seankhliao
Copy link
Member

It also wouldn't fix it for other implementations of json (and yaml?) decoders

@dsnet
Copy link
Member

dsnet commented Jul 23, 2021

I suspect YAML decoders will be fine because they will probably use the UnmarshalText methods, but who knows.

@ianlancetaylor
Copy link
Contributor

As an aside: it was a mistake to implement UnmarshalJSON methods on time.Time, and there should only have been the UnmarshalText methods since those take in the unescaped string.

For what it's worth, time.Time.UnmarshalJSON was added before UnmarshalText existed. It was one of the reasons that UnmarshalText was invented. See https://golang.org/s/go12encoding.

@dsnet
Copy link
Member

dsnet commented Jul 23, 2021

Crazy idea: The json package can try the UnmarshalJSON method first. If it errors, the JSON token is a string, and a UnmarshalText method exists, then try the UnmarshalText method.

This ensures backwards compatibility where UnmarshalJSON takes precedence over UnmarshalText. This should be mostly compatible as all it would ever do is convert a few error cases to non-errors (presumably all desirable behavior changes).

EDIT: This approach assumes that either UnmarshalJSON is side-effect-free in the event of an error OR UnmarshalText discards all pre-existing state on the receiver. If either is not true, then we can run into silent data corruption.

At least for time.Time, we know that it's the latter since unmarshal overwrites the entirety of the receiver value.

@neild
Copy link
Contributor

neild commented Jul 23, 2021

Make it a specific sentinel error returned by UnmarshalJSON, and it'll be entirely backwards compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants