Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: time: make UnmarshalText, UnmarshalJSON enforce strict RFC 3339 semantics #57912

Open
rsc opened this issue Jan 18, 2023 · 18 comments
Open
Labels
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Jan 18, 2023

Work for #54580 made RFC3339 parsing adhere more strictly to the RFC. This was treated as a bug fix instead of a proposal, but we have evidence that it breaks at least some tests in the AWS SDK, so we've rolled it back for Go 1.20.

This new issue proposes to roll the change forward for Go 1.21, disabled by GODEBUG=timerfc3339strict=0. Thanks to #56986, this will mean that Go programs with 'go 1.20' or earlier in their go.mod files will get the lax parsing by default. Only modules that say 'go 1.21' or later will get the strict parser.

@liggitt
Copy link
Contributor

liggitt commented Jan 18, 2023

Hoisting my question from http://go.dev/cl/462285, I wondered how the non-default-behavior counter would work, and how godebug.Setting#Value() is intended to be called.

Would that counter increment every time an RFC3339 parse is performed, with the additional strictness checks being skipped completely if timerfc3339strict=0?

Or would the additional strictness checks be performed, but with failures ignored if timerfc3339strict=0, so the counter would represent calls that would fail but for the timerfc3339strict setting?

@rsc
Copy link
Contributor Author

rsc commented Jan 18, 2023

In general a counter for #56986 needs to increment only when the behavior changes. CL 462285 would need revision to get the increments into exactly the right places.

@rsc
Copy link
Contributor Author

rsc commented Jan 18, 2023

If anyone knows or can easily chase down the sources of any almost-RFC3339 times, that would be useful information.

@bcmills
Copy link
Contributor

bcmills commented Jan 18, 2023

Note that what actually ended up being implemented for #54580 (in https://go.dev/cl/444277) was for Time.UnmarshalTime and Time.UnmarshalJSON, not time.Parse(time.RFC3339, …).

@rsc rsc changed the title proposal: time: make Parse(RFC3339, s) enforce strict RFC 3339 semantics proposal: time: make UnmarshalText, UnmarshalJSON enforce strict RFC 3339 semantics Jan 18, 2023
@rsc
Copy link
Contributor Author

rsc commented Jan 18, 2023

Thank you, retitled.

@renthraysk
Copy link

git svn was modified to accept a single digit hour, as some subversion servers seems to be producing them.

git/git@784f4b6

@rsc
Copy link
Contributor Author

rsc commented Jan 19, 2023

Here is some data. On my laptop I have a copy of the latest tagged and untagged version of every module stored on the proxy (I throw away all the non-source files and zip them and they fit).

badrfc.txt contains all the matches for any of the three invalid RFC3339 times, including in test files. Sometimes a mention in a test file is testing that something is invalid, of course.

hour1.txt contains all the matches for times with single-digit hours.

The counts of unique lines matched in non-test files is:

 104 // Time the End-User's information was last updated. Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z as measured in UTC until the date/time.
 104 // Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z
  56 // Example: `2019-02-01T0:59:00.789Z`
  27 // Timestamp for the transition of the alarm state. For example, the time when the alarm transitioned from OK to Firing. Available for state transition entries only. Note: A three-minute lag for this value accounts for any late-arriving metrics.  Example: `2019-02-01T0:59:00.789Z`
  24 "pendingTime" : "2000-00-01T0:00:00Z",
  16 // JSON number representing the number of seconds from 1970-01-01T0:0:0Z
  11 // number representing the number of seconds from 1970-01-01T0:0:0Z as
   8 Description: `Timestamp for the transition of the alarm state. For example, the time when the alarm transitioned from OK to Firing. Available for state transition entries only. Note: A three-minute lag for this value accounts for any late-arriving metrics. Example: ` + "`" + `2019-02-01T0:59:00.789Z` + "`" + ``,
   5 // 1970-01-01T0:0:0Z as measured in UTC until the date/time. This is the type
   4 // Time the End-User's information was last updated. Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z
   3 []string{"2020-12-01T7:31:00Z", "", "2020-12-03T07:31:00Z", "", "2020-12-05T07:31:00Z",
   3 []string{"", "2020-12-02T7:18:00Z", "", "", "2020-12-05T7:18:00Z", "", "", "2020-12-09T7:18:00Z", "", ""})
   2 startTime, err := time.Parse(time.RFC3339, "2010-01-01T1:00:00-00:00")
   2 GenesisBlockConfirmedTime string = "2018-04-17T5:07:31.000000000Z"
   2 DateOfBirth:  "1960-10-17T0:00:00Z",
   2 //Time returns a time.Time equal to "2015-11-21T8:41:55Z".
   2 // representing the number of seconds from 1970-01-01T0:0:0Z as measured
   2 // Time when the End-User authentication occurred. Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z as measured in UTC until the date/time. When a max_age request is made or when auth_time is requested as an Essential Claim, then this Claim is REQUIRED; otherwise, its inclusion is OPTIONAL. (The auth_time Claim semantically corresponds to the OpenID 2.0 PAPE [OpenID.PAPE] auth_time response parameter.)
   2 // REQUIRED. Time at which the JWT was issued. Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z as measured in UTC until the date/time.
   2 // - 2021-04-25T10:30:00Z, 2021-04-25T3:30:00-0100 and 2021-04-25T12:30:00+0200 are equivalent, they represent the same point in time
   2 "Timestamp":           `[TIMESTAMP]`,           //string  ,  Unencoded: 2016-01-17T8:15:07.127-05
   1 iat string // REQUIRED. Time at which the JWT was issued. Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z as measured in UTC until the date/time.
   1 exp string //	REQUIRED. Expiration time on or after which the ID Token MUST NOT be accepted for processing. The processing of this parameter requires that the current date/time MUST be before the expiration date/time listed in the value. Implementers MAY provide for some small leeway, usually no more than a few minutes, to account for clock skew. Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z as measured in UTC until the date/time. See RFC 3339 [RFC3339] for details regarding date/times in general and UTC in particular.
   1 auth_time string // Time when the End-User authentication occurred. Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z as measured in UTC until the date/time. When a max_age request is made or when auth_time is requested as an Essential Claim, then this Claim is REQUIRED; otherwise, its inclusion is OPTIONAL. (The auth_time Claim semantically corresponds to the OpenID 2.0 PAPE [OpenID.PAPE] auth_time response parameter.)
   1 // seconds from 1970-01-01T0:0:0Z as measured in UTC until the
   1 // [ updated_at ] {number} Time the end-user's information was last updated, as number of seconds since the Unix epoch (1970-01-01T0:0:0Z) as measured in UTC until the date/time.
   1 // UpdatedAt Time the End-User's information was last updated. Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z as measured in UTC until the date/time.
   1 // REQUIRED. Expiration time on or after which the ID Token MUST NOT be accepted for processing. The processing of this parameter requires that the current date/time MUST be before the expiration date/time listed in the value. Implementers MAY provide for some small leeway, usually no more than a few minutes, to account for clock skew. Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z as measured in UTC until the date/time. See RFC 3339Klyne, G., Ed. and C. Newman, “Date and Time on the Internet: Timestamps,” July 2002. [RFC3339] for details regarding date/times in general and UTC in particular.
   1 // REQUIRED. Expiration time on or after which the ID Token MUST NOT be accepted for processing. The processing of this parameter requires that the current date/time MUST be before the expiration date/time listed in the value. Implementers MAY provide for some small leeway, usually no more than a few minutes, to account for clock skew. Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z as measured in UTC until the date/time. See RFC 3339 [RFC3339] for details regarding date/times in general and UTC in particular.
   1 //	updated_at	number          `json:"given_name"` // Time the End-User's information was last updated. Its value is a JSON number representing the number of seconds from 1970-01-01T0:0:0Z as measured in UTC until the date/time.
   1 $ ./papertrail-archive download --token your-token --from "2019-06-30T00:23:02+02:00" --to "2019-06-30T3:23:02+02:00"
   1 "updated_at":         "1970-01-01T0:0:0Z",

comma.txt contains all the matches for times with decimal commas. The non-test line counts are:

   6 Time string `json:"time,omitempty" example:"2019-02-20T22:01:44,654015561+00:00"`
   3 // {"type": "server", "timestamp": "2022-01-20T15:46:00,131Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "elasticsearch", "node.name": "brandon-testing-elasticsearch", "message": "uncaught exception in thread [main]",
   2 elasticSearchTimestampFormat = "2006-01-02T15:04:05,000Z07:00"
   2 // {"type": "server", "timestamp": "2022-01-17T18:31:47,365Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elasticsearch", "node.name": "ubuntu-jammy", "message": "initialized" }
   2 // example: {"cluster.name":"docker-cluster","component":"o.e.t.TransportService","level":"INFO","message":"publish_address {172.18.0.4:9300}, bound_addresses {0.0.0.0:9300}","node.name":"3404ffa7b26c","timestamp":"2022-04-13T17:24:56,134Z","type":"server"}
   1 // {"type": "server", "timestamp": "2022-01-17T18:31:47,365Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elasticsearch", "node.name": "ubuntu-impish", "message": "initialized" }

Does Elasticsearch maybe localize its RFC3339 strings? In the test files, there are 123 instances of the comment

// utmpdump: [7] [14962] [ts/2] [vagrant ] [pts/2       ] [10.0.2.2            ] [10.0.2.2       ] [2019-01-24T09:51:51,367964+00:00]

Does utmpdump maybe localize its RFC3339 strings?

2460.txt contains all the matches for times with bad zones +hh:mm where hh >= 24 or mm >= 60. There are no non-test matches.

Disallowing +24:00 and +00:60 seem OK.

The outputs mentioning things like elasticsearch and utmpdump give me pause. I really didn't expect any comma times.

The quantity of output with shortened hours also gives me pause. This is clearly a common thing to find on the internet.

Is there a strong security reason to reject any of these, like there was for #30999?

@rsc
Copy link
Contributor Author

rsc commented Jan 19, 2023

Elaborating on the comparison to #30999, we disallowed non-zero octets with leading zeros because Go read those as decimal while BSD-derived systems read them as octal: Go read 18.032.4.011 as 18.32.4.11 while most C libraries read it as 18.26.4.9. That's a major inconsistency that could lead to any number of interesting security problems. Rejecting them outright eliminates the inconsistency: if Go and C both parse successfully, they agree on the result.

Are there any known inconsistencies in the handling of these three kinds of RFC3339 times?

  • For the 1-digit hour, there is a clear answer for what it means that everyone agrees on, and Go does that. I'm not surprised that it works nor that it happens in common usage. The syntax is clear and the meaning is valid.
  • For the decimal comma in the seconds, I'm a little surprised both that it works and that it happens in practice, but we have two examples that do not look hand-written. It is easy to see how printing a time with a C printf format string would result in that comma, though, since C printf is localized. The comma could cause tokenization problems, if say the times were in a comma-separated list, but tokenization has already happened when UnmarshalText or UnmarshalJSON are called. That is, the caller has identified a particular string as the entire input that should be parsed. At that point, again there is an obvious answer for what a successful parse would say the answer was. The meaning is clear and valid.
  • For +24:00 or +00:60, the syntax is clear but the meaning is arguably invalid. There is also no evidence (yet) that these occur in practice. Disallowing these seems potentially OK.

Any change we make should have a GODEBUG, of course, but if people have old data with these times, it will mean that they have to set the GODEBUG indefinitely when parsing that data. And if libraries connect to servers or read from programs that generate these times (such as apparently some versions of Subversion, possibly Elasticsearch, and utmpdump), those libraries have to either stop doing direct JSON unmarshal into time.Time, force callers to set the godebug, set the GODEBUG themselves by environment manipulation, or rewrite the entire JSON input before calling unmarshal. These are all sufficiently painful as to caution against making the change at all, without some overriding benefit like the security argument in #30999.

With the current data, it sounds like we should permanently remove the 1-digit hour and decimal comma checks. Maybe we could keep the +24:00 and +00:60 behind a GODEBUG, or maybe it's not worth the bother.

@rsc
Copy link
Contributor Author

rsc commented Jan 19, 2023

I went looking to see which C programs I could coerce into printing decimal commas in RFC3339 times using locales. Utmpdump doesn't need any coercion at all - it appears to have the decimal comma hard-coded.

$ LANG=C LC_ALL=C utmpdump /var/log/wtmp | tail -5
Utmp dump of /var/log/wtmp
[7] [2906041] [ts/1] [rsc     ] [pts/1       ] [172.253.31.2        ] [172.253.31.2   ] [2023-01-18T18:08:56,374969+00:00]
[8] [2906041] [    ] [        ] [pts/1       ] [                    ] [0.0.0.0        ] [2023-01-18T19:00:57,284248+00:00]
[7] [2916903] [ts/1] [rsc     ] [pts/1       ] [172.253.31.0        ] [172.253.31.0   ] [2023-01-18T20:28:37,954841+00:00]
[8] [2916903] [    ] [        ] [pts/1       ] [                    ] [0.0.0.0        ] [2023-01-19T06:09:53,875435+00:00]
[7] [3016807] [ts/1] [rsc     ] [pts/1       ] [172.253.31.2        ] [172.253.31.2   ] [2023-01-19T13:31:14,160542+00:00]
$ 

@fzipp
Copy link
Contributor

fzipp commented Jan 19, 2023

@bcmills
Copy link
Contributor

bcmills commented Jan 19, 2023

those libraries have to either stop doing direct JSON unmarshal into time.Time, force callers to set the godebug, set the GODEBUG themselves by environment manipulation, or rewrite the entire JSON input before calling unmarshal. These are all sufficiently painful as to caution against making the change at all […].

I agree that setting GODEBUG in perpetuity seems like a non-starter, and rewriting the JSON input sounds fragile. But why would it be particularly painful to require libraries that want to use non-standard time formats to use a non-standard unmarshaling function, or a field of a non-standard type? (And if someone is using a data format with a non-standard time format, shouldn't they be using a type that also marshals to that format?)

@rsc
Copy link
Contributor Author

rsc commented Jan 19, 2023

Good points, thanks. More data about how common these are (or alternately why they arise) would still help.

@aajtodd
Copy link

aajtodd commented Jan 19, 2023

👋 Hello from AWS SDK for Go team.

The issue and rollback CR both mention AWS SDK test(s) breaking.

However, this change seems to have broken an AWS S3 unit test

We just want to clarify was the breakage only in the minio project or were there also issues in the AWS SDK for Go (v1 or v2) projects?

It would not surprise me if there were AWS services that return non compliant timestamps with e.g. single digit hours. However, v1 and v2 AWS SDK for Go projects don't rely on Time.UnmarshalJSON or Time.UnmarshalText (they rely on custom parsers/formatters v1/v2). I don't think we are affected by this change but wanted to get clarification first on what tests broke.

@dims
Copy link

dims commented Jan 19, 2023

@aajtodd so far what we know broke is in this issue - #57897 (comment) - we don't have evidence that AWS SDK is broken. yes like you i feel the same ("It would not surprise me if there were AWS services that return non compliant timestamps with e.g. single digit hours"), but that's something we need to chase internally.

@dsnet
Copy link
Member

dsnet commented Jan 19, 2023

I wonder how much of the compatibility issues arise from the difference between ISO 8601 and RFC 3339. The former is a complicated collection of grammars for valid timestamps, while RFC 3339 is a specific subset of ISO 8601 with the primary intent of avoiding the very compatibility issues we're discussing now. The more systems that claim to use RFC 3339, but not really are making the problem worse as it dilutes the exact meaning of RFC 3339.

However, that doesn't explain everything, such as why single-digit hours are so common. For example, ISO 8601 still seems to require a two-digit hour fields as it says "hour is represented by two digits from [00] to [24]." That said, ISO 8601 is behind a paywall and I doubt many people actually read it. Some second hand sources that claim to describe ISO 8601 incorrectly mention that hour fields can have one or two digits.

Does utmpdump maybe localize its RFC3339 strings?

According to the man page for utmpdump v2.34, it "[uses] millisecond precision ISO-8601 timestamp format". A comma separator is valid for ISO 8601, but not RFC 3339.

comma.txt contains all the matches for times with decimal commas. The non-test line counts are:

The ability to use comma was recently added in Go 1.17 for #43823. Most likely, the impact of that change to Parse for parsing of RFC 3339 was unintentional. The small number of cases is probably related to the young age of when this was accidentally introduced.

set the GODEBUG themselves by environment manipulation, or rewrite the entire JSON input before calling unmarshal.

This could also be solved with #21990, where we could somehow support parsing timestamps according to ISO 8601 (or some looser format) instead of RFC 3339.

don't have evidence that AWS SDK is broken

AWS itself is internally inconsistent about whether it uses RFC 3339 or ISO 8601 as both appear in their documentation:

I'm not claiming AWS SDK is broken, just that there's much evidence that ISO 8601 is still in heavy use and the time package provides no good support for it.

aajtodd added a commit to aws/aws-sdk-go that referenced this issue Jan 23, 2023
Removes timestamp with single digit hour code to avoid confusion. Also
fixes lint issues.

References:
* golang/go#57912 (comment)
* GOSDK-2932
aajtodd added a commit to aws/aws-sdk-go-v2 that referenced this issue Jan 23, 2023
Removes timestamp with single digit hour code to avoid confusion. Also
fixes lint issues.

References:
* golang/go#57912 (comment)
* GOSDK-2932
@apparentlymart
Copy link

apparentlymart commented Jan 25, 2023

I was originally in favor of having the functions which are specifically documented as accepting RFC 3339 strings strictly implement that specification, while retaining time.Parse for those who need to parse timestamps that are not exactly RFC 3339.

However, once I saw the impact of this on existing code I started to lean more towards changing the documentation of time.Time.UnmarshalText and time.Time.UnmarshalJSON to document what it actually accepts, and treat that as the new expected behavior of that method. These "unmarshal" functions feel to me like a sort of liberal default behavior intended to be useful in a broad number of situations -- "I just want timestamps in my JSON and I don't really care exactly what format they are in" -- and those who need a particular format would be better off using something tailored to that purpose so their intent is more explicit.


With that said, I do maintain an application which itself exposes RFC 3339 parsing to its users, originally via Go's implementation, and that application is not required to consume such a wide range of input and so I prefer to strictly require RFC 3339 to minimize the risk of someone accidentally relying on a behavior that isn't guaranteed nor documented. In response to the decision to revert the stricter parsing in stdlib I have for now just copied the stricter parser into my own package so that my application's interpretation of timestamps will be independent of the Go stdlib's.

f the time package were to gain a new function that is specified as exactly what RFC 3339 requires and nothing more -- whether it be UnmarshalText/UnmarshalJSON as proposed here or something else entirely -- then I would probably switch to it, but I'm also open to considering my needs to be more specialized than the needs of the Go stdlib and so I feel okay about maintaining my own copy.

Overall it feels most pragmatic to loosen the documented contract for the existing functions and let those with stricter requirements (like me) solve that independently, and so I'm a soft 👎 for this proposal as currently written, but I personally wouldn't be negatively effected by the change and would slightly benefit from it.

aajtodd added a commit to aws/aws-sdk-go that referenced this issue Jan 25, 2023
Removes timestamp with single digit hour code to avoid confusion. Also
fixes lint issues.

References:
* golang/go#57912 (comment)
* GOSDK-2932
aajtodd added a commit to aws/aws-sdk-go-v2 that referenced this issue Jan 25, 2023
Removes timestamp with single digit hour code to avoid confusion. Also
fixes lint issues.

References:
* golang/go#57912 (comment)
* GOSDK-2932
@ulikunitz
Copy link
Contributor

ulikunitz commented Jan 27, 2023

Regarding the comma (',') in seconds. The grammar in the appendix A of RFC 3339 describes ISO 8601:1998 but the Internet Date/Time Format is defined in section 5 of the RFC. The two grammars differ. Section 5 does not support the comma, but appendix A does. There are a lot of differences: e.g. Appendix A allows two-digit years, Section 5 requires four-digit years.

The Internet Date/Time Format specifies a subset of date/time stamps defined by ISO8601.

By the way ISO8601 (or at least ISO 8601:2004) says that the comma is the preferred sign.

@gopherbot
Copy link

Change https://go.dev/cl/521501 mentions this issue: time: roll forward with strict RFC 3339 parsing checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests