Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: timezone name heuristic in Parse rejects MSK and MSD #3790

Closed
gopherbot opened this issue Jul 1, 2012 · 31 comments
Closed

time: timezone name heuristic in Parse rejects MSK and MSD #3790

gopherbot opened this issue Jul 1, 2012 · 31 comments
Milestone

Comments

@gopherbot
Copy link

by k.shutemov:

I found that that current heuristic for parsing stdTZ is too strict:

go/src/pkg/time/format.go
 809                         if len(value) >= 3 && value[2] == 'T' {
 810                                 p, value = value[0:3], value[3:]
 811                         } else if len(value) >= 4 && value[3] == 'T' {
 812                                 p, value = value[0:4], value[4:]
 813                         } else {
 814                                 err = errBad
 815                                 break
 816                         }

As you can see it only accepts timezone names with 'T' at end. It's not
correct: "MSK" is a valid timezone name.
@patrickmn
Copy link

Comment 1:

For reference, MSK is now (since March 2011) MSD, which is UTC+4.

@rsc
Copy link
Contributor

rsc commented Sep 12, 2012

Comment 2:

Labels changed: added priority-later, removed priority-triage.

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Sep 12, 2012

Comment 3:

Labels changed: added go1.1maybe.

@rsc
Copy link
Contributor

rsc commented Dec 10, 2012

Comment 4:

An experiment to run. Load every location in the tz directory. Convert
Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec 1 00:00:00 2012 UTC to local time and
record time zone abbreviation.
What's that list? Are MSK and MSD the only exceptions to the [A-Z]{2,3}T pattern?

@gopherbot
Copy link
Author

Comment 5 by k.shutemov:

Here's results:
AZOST
CHADT
CHAST
ChST
EASST
GMT-1
GMT+1
GMT-10
GMT+10
GMT-11
GMT+11
GMT-12
GMT+12
GMT-13
GMT-14
GMT-2
GMT+2
GMT-3
GMT+3
GMT-4
GMT+4
GMT-5
GMT+5
GMT-6
GMT+6
GMT-7
GMT+7
GMT-8
GMT+8
GMT-9
GMT+9
MeST
MSK
UTC
WARST
zzz
The script I've used and raw results: https://gist.github.com/4250471

@gopherbot
Copy link
Author

Comment 6 by k.shutemov:

The same, but I've included historical timezones for 1850 -- 2012 years
AKTST
ALMST
ANAST
AQTST
ASHST
AZOMT
AZOST
BAKST
BEAUT
BORTST
CHADT
CHAST
CHOST
ChST
CKHST
DUSST
EASST
FRUST
GMT-1
GMT+1
GMT-10
GMT+10
GMT-11
GMT+11
GMT-12
GMT+12
GMT-13
GMT-14
GMT-2
GMT+2
GMT-3
GMT+3
GMT-4
GMT+4
GMT-5
GMT+5
GMT-6
GMT+6
GMT-7
GMT+7
GMT-8
GMT+8
GMT-9
GMT+9
HOVST
IRKST
KIZST
KRAST
KUYST
MADMT
MADST
MAGST
MALST
MeST
MSD
MSK
NOVST
OMSST
ORAST
PETST
QYZST
SAKST
SAMST
SHEST
SVEST
TASST
TBIST
ULAST
URAST
UTC
UYHST
VLASST
VLAST
VOLST
WARST
YAKST
YEKST
YERST
zzz
Updated script and raw results: https://gist.github.com/4251094

@rsc
Copy link
Contributor

rsc commented Dec 11, 2012

Comment 7:

Thanks very much. Looks like the allowe patterns should be:
MSK
MSD
UTC
GMT
GMT[+-][0-9]+ w/ nonzero number between -14 and +12
[A-Z][A-Za-z]?[A-Z]?[DMSU]T
(I'm not suggesting to use package regexp, just using that notation to
express what should be allowed.)

@rsc
Copy link
Contributor

rsc commented Feb 19, 2013

Comment 8:

Windows uses UTC[+-][0-9]+ sometimes. Perhaps we should allow those in addition to
GMT[+-][0-9]+.
See notes on issue #4838.

@rsc
Copy link
Contributor

rsc commented Mar 12, 2013

Comment 9:

[The time for maybe has passed.]

Labels changed: removed go1.1maybe.

@minux
Copy link
Member

minux commented Jun 9, 2013

Comment 10:

Issue #5672 has been merged into this issue.

@matrixik
Copy link

Comment 11:

I found one more:
MST
--- FAIL: TestLocalZoneAbbr (0.00 seconds)
        zoneinfo_windows_test.go:18: Parse failed: parsing time "Sat, 22 Jun 2013 13:42:36 +0200" as "Mon, 02 Jan 2006 15:04:05 MST": cannot parse "+0200" as "MST"
Windows 7 x32
go version devel +56909cb770fe Fri Jun 21 18:07:57 2013 -0700 windows/386
Best regards,
Dobrosław Żybort

@alexbrainman
Copy link
Member

Comment 12:

matrixik,
Do you mind running this program http://play.golang.org/p/vCsioBDpMk on your computer
and reporting results back here? I suspect your syscall.Timezoneinformation is different
from what Go supports at this moment. And I am wonering why and what to do about it.
Thank you.
Alex

@matrixik
Copy link

Comment 13:

Alex,
Results:
```
i=syscall.Timezoneinformation{Bias:-60, StandardName:[32]uint16{0x15a, 0x72, 0x6f, 0x64,
0x6b, 0x6f, 0x77, 0x6f, 0x65, 0x75, 0x72, 0x6f, 0x70, 0x65, 0x6a, 0x73, 0x6b, 0x69,
0x20, 0x63, 0x7a, 0x61, 0x73, 0x20, 0x73, 0x74, 0x61, 0x6e, 0x64, 0x2e, 0x0, 0x0},
StandardDate:syscall.Systemtime{Year:0x0, Month:0xa, DayOfWeek:0x0, Day:0x5, Hour:0x3,
Minute:0x0, Second:0x0, Milliseconds:0x0}, StandardBias:0,
DaylightName:[32]uint16{0x15a, 0x72, 0x6f, 0x64, 0x6b, 0x6f, 0x77, 0x6f, 0x65, 0x75,
0x72, 0x6f, 0x70, 0x65, 0x6a, 0x73, 0x6b, 0x69, 0x20, 0x63, 0x7a, 0x61, 0x73, 0x20,
0x6c, 0x65, 0x74, 0x6e, 0x69, 0x0, 0x0, 0x0}, DaylightDate:syscall.Systemtime{Year:0x0,
Month:0x3, DayOfWeek:0x0, Day:0x5, Hour:0x2, Minute:0x0, Second:0x0, Milliseconds:0x0},
DaylightBias:-60}
i.StandardName="Środkowoeuropejski czas stand."
i.DaylightName="Środkowoeuropejski czas letni"
```
Where:
Środkowoeuropejski czas stand. = Central European Standard Time.
Środkowoeuropejski czas letni = Central European Summer Time
Best regards,
Dobrosław Żybort

@alexbrainman
Copy link
Member

Comment 14:

matrixik,
Thank you for the report. I think your issue is different, so I am starting new issue
https://golang.org/issue/5783. Lets continue our conversation there.
Alex

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 15:

I don't fully understand what remains here but let's finish it for Go 1.2.

Labels changed: added go1.2.

@robpike
Copy link
Contributor

robpike commented Jul 31, 2013

Comment 16:

Owner changed to @robpike.

@snaury
Copy link
Contributor

snaury commented Aug 6, 2013

Comment 17:

Is there a CL for this issue? What about something like this?
diff -r da11b2a1cbc1 src/pkg/time/format.go
--- a/src/pkg/time/format.go    Tue Aug 06 12:04:08 2013 -0700
+++ b/src/pkg/time/format.go    Wed Aug 07 01:18:31 2013 +0400
@@ -939,8 +939,17 @@
            } else if len(value) >= 4 && value[3] == 'T' {
                p, value = value[0:4], value[4:]
            } else {
-               err = errBad
-               break
+               var i int
+               for i = 0; i < len(value); i++ {
+                   if value[i] < 'A' || 'Z' < value[i] {
+                       break
+                   }
+               }
+               if i != 3 && i != 4 {
+                   err = errBad
+                   break
+               }
+               p, value = value[0:i], value[i:]
            }
            for i := 0; i < len(p); i++ {
                if p[i] < 'A' || 'Z' < p[i] {
Making any 3 or 4 uppercase letters a timezone.

@snaury
Copy link
Contributor

snaury commented Aug 6, 2013

Comment 18:

On the other hand it might not be enough. This shows:
http://www.timeanddate.com/library/abbreviations/timezones/
That timezones may be as short as 1 character, and there's one timezone with a lowercase
letter in it: ChST.

@robpike
Copy link
Contributor

robpike commented Aug 14, 2013

Comment 19:

Russ's pattern is
MSK
MSD
UTC
GMT
GMT[+-][0-9]+ w/ nonzero number between -14 and +12
[A-Z][A-Za-z]?[A-Z]?[DMSU]T
but the only lower-case letter apperas to be 'h', so I suggest
[A-Z][A-Zh]?[A-Z]?[DMSU]T

@robpike
Copy link
Contributor

robpike commented Aug 14, 2013

Comment 20:

The full list at http://www.timeanddate.com/library/abbreviations/timezones/ shows that
the [DMSU] check is wrong. there are many time zones that don't have that property, PWT
for one.  Thank you QuickCheck.

@robpike
Copy link
Contributor

robpike commented Aug 14, 2013

Comment 21:

I have an idea. Details to follow.

@nigeltao
Copy link
Contributor

Comment 22:

If you need to scrape that page, this program:
--------
package main
import (
    "fmt"
    "log"
    "net/http"
    "code.google.com/p/go.net/html"
)
func walk(root *html.Node, f func(*html.Node)) {
    f(root)
    for n := root.FirstChild; n != nil; n = n.NextSibling {
        walk(n, f)
    }
}
func main() {
    res, err := http.Get("http://www.timeanddate.com/library/abbreviations/timezones/")
    if err != nil {
        log.Fatal(err)
    }
    defer res.Body.Close()
    doc, err := html.Parse(res.Body)
    if err != nil {
        log.Fatal(err)
    }
    walk(doc, func(n *html.Node) {
        if n.Type != html.ElementNode || n.Data != "tr" {
            return
        }
        n = n.FirstChild
        if n == nil || n.Type != html.ElementNode || n.Data != "td" {
            return
        }
        n = n.FirstChild
        if n == nil || n.Type != html.ElementNode || n.Data != "a" {
            return
        }
        n = n.FirstChild
        if n == nil || n.Type != html.TextNode {
            return
        }
        fmt.Println(n.Data)
    })
}
--------
prints:
--------
A
ACDT
ACST
ADT
ADT
AEDT
AEST
AFT
...
YAPT
YEKST
YEKT
Z
--------
Note that some rows (e.g. ADT) are repeated.

@robpike
Copy link
Contributor

robpike commented Aug 15, 2013

Comment 23:

This issue was updated by revision 727b2b6.

Handle time zones like GMT-8.
The more general time zone-matching problem is not yet resolved.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12922043

@robpike
Copy link
Contributor

robpike commented Aug 15, 2013

Comment 24:

Much to my disappointment, that URL doesn't have all the three-letter abbreviations even
listed on my own machine. I think it's hopeless to have a definitive list, especially as
it's sure to change as local governments decide to feel important by inventing their own
time zones (I'm looking at you, Eucla). So it's sad but the right answer is probably
just some simple sanity checks. CL coming.

@robpike
Copy link
Contributor

robpike commented Aug 15, 2013

Comment 25:

This issue was closed by revision a454d2f.

Status changed to Fixed.

@gopherbot
Copy link
Author

Comment 26 by diogomfranco:

That commit doesn't handle the *many* 5-letter abbreviations in the lists above --
they're almost all rejected since they don't have a T as the 4th character, but as the
last. The lists above also happen to not include my local timezone, ESAST (according to
windows).

@gopherbot
Copy link
Author

Comment 27 by diogomfranco:

(or treated as 3-letter timezones, which I assume breaks parsing later on due to the
leftover 2-letter garbage (why can't googlecode let me edit my comment))

@alexbrainman
Copy link
Member

Comment 28:

diogomfranco, please run this program http://play.golang.org/p/vCsioBDpMk on your
computer and post results here, so we know what your time zone is. Thank you.
Alex

@robpike
Copy link
Contributor

robpike commented Aug 16, 2013

Comment 29:

This issue was closed by revision 4c855f3.

@gopherbot
Copy link
Author

Comment 30 by diogomfranco:

re #28:
i=syscall.Timezoneinformation{Bias:180, StandardName:[32]uint16{0x45, 0x2e, 0x20, 0x53,
0x6f, 0x75, 0x74, 0x68, 0x20, 0x41, 0x6d, 0x65, 0x72, 0x69, 0x63, 0x6}
i.StandardName="E. South America Standard Time"
i.DaylightName="E. South America Daylight Time"

@alexbrainman
Copy link
Member

Comment 31:

diogomfranco, I just changed my timezone to "(GMT-03:00) Brasilia" and run time test:
C:\go\root\src>go test time
ok      time    11.141s
C:\go\root\src>hg id
9fb09f7edb82 tip
The test passes. So, I suspect, you shouldn't have any problems too.
Alex

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants