Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: Parse too greedy when parsing some formats #12919

Open
ksshannon opened this issue Oct 13, 2015 · 4 comments
Open

time: Parse too greedy when parsing some formats #12919

ksshannon opened this issue Oct 13, 2015 · 4 comments
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@ksshannon
Copy link
Contributor

  1. What version of Go are you using (go version)?

    $ go version go version devel +2d823fd Tue Oct 13 01:04:42 2015 +0000 linux/amd64

  2. What operating system and processor architecture are you using?

    linux/amd64

  3. What did you do?

    Attempted to parse this time: "618 AM MDT TUE OCT 13 2015" with this format string: "304 PM MST Mon Jan 02 2006"

  4. What did you expect to see?

    No parsing error, and a time.Time representing that moment.

  5. What did you see instead?

    The parsing fails, and the error reported: parsing time "618 AM MDT TUE OCT 13 2015": hour out of range. The hour is interpreted as 61.

    The format works properly with a two digit hour. It seems getnum() in time/format.go takes up to two characters whenever the second character is valid which leads to the issue.

  6. Playground example with examples that fail and pass for the given format:

    http://play.golang.org/p/mZ-m7gJlvy

Apologies if my code is messy...

The format is used as a time/version stamp for the US National Weather Service in their forecast discussion products, for a real world example, http://www.wrh.noaa.gov/total_forecast/getprod.php?wfo=boi&pil=AFD&sid=BOI&version=1

Thank you.

@ianlancetaylor ianlancetaylor changed the title time.Parse() too greedy when parsing some formats time: Parse too greedy when parsing some formats Oct 13, 2015
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Oct 13, 2015
@ianlancetaylor
Copy link
Contributor

Basically, this means that we should change getnum to only take a single digit if taking two digits would cause the value to be out of range. Right now it always takes two digits if there are two digits, and then later checks whether they are out of range. I can't decide whether this change would be a good idea or not.

@ksshannon
Copy link
Contributor Author

That feels like it would invalidly parse (or attempt to) a lot of values. I think that would cause more problems that solve. Without field width, or field separators, I don't think you can solve all cases for this format (without making assumptions that may make some invalid strings valid). Thank you.

@nerdatmath
Copy link
Contributor

This particular format is unambiguous, since "304 PM" indicates a 1-2 digit hour followed by exactly 2 digits for minutes, then whitespace, then an AM/PM indicator. The problem is, when you're looking at a starting digit of 1 for the hour, you don't know whether you're dealing with hour 1, 10, 11, or 12. Without the AM/PM indicator (24 hour clock), we would have the same issue with starting digits of 0 or 1.

One way to resolve this would be to look at 304 as a special case, in which we gather 3-4 digits and then take the last 2 as minutes and the first 1-2 as the hour. 30405 could be treated the same way (5-6 digits). But then there's also 102, 106, 12006, 10206, 1022006, and so on. There are 5 variable width numeric fields and 5 corresponding fixed width numeric fields, plus 2 numeric year representations, so 5 * (2^4-1) * 3 + 5 * 2 = 235 unambiguous representations that we don't handle currently. Even more if you allow the variable width field to be somewhere other than at the front.

Still, I can envision code that could handle all those situations gracefully. One option: when encountering a single digit format specifier, verify that the remaining adjacent numeric fields are fixed width, and look ahead to count the number of digits available. If there is an extra digit, parse 2, otherwise parse 1.

If you have adjacent variable width numeric fields in the format, such as "34", this should probably be an "ambiguous layout" error, at least when parsing.

@b1tfury
Copy link

b1tfury commented Oct 22, 2015

Instead of treating all ambiguities I suggest that getnum function is doing its job perfectly only problem I see is that we need to throw a different error when input layout for HH:MM is given as H:MM / HH:M.
As our program is failing in only these cases only. And also changing getnum will have multiple side effects.

@agnivade agnivade added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Nov 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

5 participants