New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: time: improve performance of calendar algorithms and clarify its specification #63844
Comments
kindly CC @rsc (per https://dev.golang.org/owners) |
It should be possible to split codepath into slow and fast versions depending on the year value without defining any new constraints on the year value, shouldn't it? |
Thanks @AlexanderYastrebov . It's possible, but I'm not sure if it's suitable. I've changed some functions, like absDate, daysSinceEpoch and Date, and we would add some complexity (and another branch) for a range of date that is unusal and that other languages already don't support. IMHO we should even add some asserts and panic if the year is outside the "new" specification. Make sense? |
Panicking is a bit extreme, innit? If I understand Cassio Neri's example C++ code correctly, he restricts values only for the duration of the calculation. Something like this (but also handling negative years):
I would propose limiting timestamps to those that can be represented by int64 (i.e., handling overflow), or close to it (capping the year value to prevent overflow), and using the mechanism above to deal with calculations whose intermediate values may overflow.
UTC is not the only kind of UT, so this shouldn't prevent me from printing the time when atoms started forming with nanosecond precision in my local timezone, so what if my city or my galaxy doesn't exist yet ;) |
I wish to share my experience after having studied the Gregorian calendar and implemented myself support for it in the Linux Kernel, libstdc++, Firefox (for its Javascript engine) and also assisted others to implement them for .NET and Java (ongoing) official standard libraries and also a Rust user library. The range of supported dates must be very clearly specified by the language as it is a contract which users can rely on. This is the first piece of information that I searched when I implemented these algorithms for any language's standard library. What should be a reasonable range of dates selected by the language? There's no absolute answer for this question and @normandesjr's proposal cites a few choices taken by different languages. Pragmatism is required to select one since complications come from historical, geographical and astronomical aspects. To know more about the historical and geographical points, please, see my talk at C++ Now 2023 (from 27min7s) but let me elaborate on the astronomical point raised by @normandesjr and commented by @unixdj. This question is not about UTC and time zones. Astronomers calculate that in the time of the dinosaurs, during a year Earth would spin about 400 times and not the 365 or so that it does nowadays. In other words, years were 400 days or so long. If there were people and computers around at that time they would most certainly not be using a calendar remotely similar to the Gregorian and thus for us to talk about 1st January year -65,000,000 is a bit of nonsense. Similarly, in a few million years the number of days in a year is expected to be far less than 365. I doubt the Gregorian calendar will still be used at this point. We don't need to go that far and even if Earth's movements were steady, the fact that the ratio between the duration of a day and the duration of a year is an irrational number means that the Gregorian calendar will be out of sync with Solstices and Equinoxes in a couple of thousands of years which again raises the question whether the Gregorian calendar will be used, unchanged, at this point. In light of the above, should Golang commit itself to support a range of Gregorian dates spanning billions of years? IMHO, it should restrict the range to a more pragmatic one that suits well and efficiently the large majority of users now and in many generations to come but no more. A more restricted range allows more algorithms to be used and the best performing one (known or to be discovered) can be implemented. The point is that current users should not pay the price of using less performant algorithms that are destined to become obsolete far in the future just to support the current calendar far beyond its existence. @normandesjr's proposal suggests picking the same range as C++ (from year -32767 to year 32767), which I believe is a very sensible choice, but anything not much larger than a few thousands of years is more than enough. Notice that changing the internal representation of time and its resolution is not proposed. Only conversion of internal representation from and to calendar dates (year, month, day) are touched. Another advantage of a more restrictive range is that it allows a unit test to run through all possible dates and check the result. I've done that for libstdc++ and the Linux Kernel and these tests run under a second. Finally, panicking if the input is out of range is useful. For Firefox, I added just a comment stating support for the range required by the JavaScript specification (Ecma-262). The thoughtful reviewer, Andre Bargul (towards whom I'm very grateful) asked me to assert that condition, which I did. Then a CI test crashed Firefox on this assertion revealing that there's was a bug elsewhere. He then fixed the bug for the benefit of all users. I hope this helps. Obviously I'd be glad to see my own algorithms being used in Golang but the question of precision of the specification is, in my opinion, more important than the choice of algorithm. @normandesjr's proposal tackles this point. It's a very good proposal. |
Definitely. I'm inclined to make this range as broad as possible, because if you have a Also, see below regarding API compatibility.
Sure, I was not entirely serious about local time. And I was mistaken about UT, I meant TAI (International Atomic Time). Loved your talk, BTW.
I would argue that perpetual Gregorian calendar is a fiction anyway, and astrophysicists talking about "13 billion years ago" is also a bit of nonsense, as you can't have a year without a planet. Unless you define a year as 31,556,952 SI seconds, then it's fine I guess.
The year range that tzcode (zoneinfo's code used in glibc and libc in many other Unix systems) supports is -2,147,481,748 to 2,147,485,547, which is the full range of
Here I disagree. The Go standard library generally panics only on application programmer errors or if the internal state goes haywire. E.g., the
Besides, introducing panics into date calculation code would break the Go 1 API compatibility promise. Which is another reason to support as wide range as possible. |
Indeed, astrophysicists might be interested in billions of years. But this is a niche domain with clever people that are very capable of rolling their own algorithms.
For two reasons.
Hence, even if |
Interesting, I didn't know this.
True. But I suspect that in many cases In any case, the Go core team will likely insist that the Go 1 compatibility promise is not broken. Look, I understand your point. You never need to display the time of the formation of a galaxy as Not the whole range works now, mind you. The But I'm just not sure how much you can further restrict the range without breaking the compatibility promise that the Go team takes quite seriously. And I hope you can see why panicking is not a good solution for Go, or at least for Go 1. I'm pretty sure the Go team would prefer to cap the value, return sentinel values or even document the limitation and return garbage. Speaking of which, wouldn't a check like
be almost as fast as this, for times within the range?
|
As I wrote in #63844 (comment) it should be possible to have two implementations conditioned on the year value. The existing implementation would need to stay anyways to verify correctness of the new code. Once proof of the pudding exists it might be easier to establish new API constraints. |
There is also a related #56909 that reports year overflow above/below certain values |
I've given an idea about panic, it was like a suggestion, the main idea here is to have the great benefit of a date algorithm faster (more than 25%) for a range that everybody needs. |
…ification This implementation improves the performance of Go's time package using the recent algorithms developed by Neri and Schneider [1]. Their algorithms are faster than alternatives, my benchmarks confirm performance improvements in Go as well. ``` goos: linux goarch: amd64 pkg: time cpu: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz | /tmp/master.txt | /tmp/nerishcneider.txt | sec/op | sec/op vs base Now-8 69.38n ± 0% 69.33n ± 0% -0.07% (p=0.001 n=20) NowUnixNano-8 70.35n ± 0% 70.08n ± 0% -0.38% (p=0.000 n=20) NowUnixMilli-8 71.47n ± 0% 71.26n ± 0% -0.30% (p=0.000 n=20) NowUnixMicro-8 71.32n ± 0% 71.21n ± 0% -0.15% (p=0.000 n=20) Format-8 441.9n ± 0% 433.8n ± 0% -1.83% (p=0.000 n=20) FormatRFC3339-8 293.9n ± 0% 283.4n ± 0% -3.57% (p=0.000 n=20) FormatRFC3339Nano-8 297.5n ± 0% 286.9n ± 0% -3.55% (p=0.000 n=20) FormatNow-8 242.4n ± 0% 239.7n ± 0% -1.11% (p=0.000 n=20) MarshalJSON-8 144.0n ± 0% 136.2n ± 0% -5.42% (p=0.000 n=20) MarshalText-8 142.0n ± 0% 136.2n ± 0% -4.02% (p=0.000 n=20) Parse-8 241.2n ± 0% 240.9n ± 0% -0.12% (p=0.002 n=20) ParseRFC3339UTC-8 78.52n ± 0% 73.12n ± 0% -6.87% (p=0.000 n=20) ParseRFC3339UTCBytes-8 86.60n ± 0% 81.05n ± 0% -6.42% (p=0.000 n=20) ParseRFC3339TZ-8 291.5n ± 0% 283.1n ± 0% -2.90% (p=0.000 n=20) ParseRFC3339TZBytes-8 343.2n ± 0% 333.2n ± 0% -2.93% (p=0.000 n=20) ParseDuration-8 125.6n ± 0% 125.8n ± 0% +0.16% (p=0.002 n=20) Hour-8 6.458n ± 0% 6.457n ± 0% ~ (p=0.130 n=20) Second-8 6.460n ± 0% 6.456n ± 0% -0.05% (p=0.002 n=20) Year-8 18.68n ± 0% 14.10n ± 0% -24.52% (p=0.000 n=20) Day-8 24.01n ± 0% 17.14n ± 0% -28.61% (p=0.000 n=20) ISOWeek-8 27.55n ± 0% 24.46n ± 0% -11.22% (p=0.000 n=20) GoString-8 157.9n ± 0% 152.9n ± 0% -3.13% (p=0.000 n=20) UnmarshalText-8 290.1n ± 0% 283.4n ± 0% -2.31% (p=0.000 n=20) Geomean 93.84n ± 0% 89.08n ± 0% -5.07% ``` To achieve better performance we should change the supported range of dates: instead of supporting years taking any int value, they should be restricted to a much smaller range which still spans tens of thousands of years. [1] Neri, C, Schneider, L. Euclidean affine functions and their application to calendar algorithms. Softw Pract Exper. 2023; 53(4): 937–970. doi:10.1002/spe.3172 https://onlinelibrary.wiley.com/doi/full/10.1002/spe.3172 For: golang#63844
Change https://go.dev/cl/548155 mentions this issue: |
I believe there's a misunderstanding about the scope of the proposal. Let me clarify. It does not propose limiting all functionalities of Functions that only need the number of ticks from epoch remain the same. Going back to the astronomers example, they can get IAU year by calling @AlexanderYastrebov: There's no need to add an As we can see from comments on other issues, as far as |
I took @normandesjr's code from https://github.com/normandesjr/go/blob/feat/date-algorithm/src/time/time.go and converted https://gist.github.com/unixdj/e31f75d5b49b5a2b3b4fbbae69368937
How much overhead does it add? FWIW, a quick benchmark shows that my version of absDate runs faster. I looked at the assembly and eliminated one addition, but I don't speak amd64. First column: original, second: after changing the beginning of
True, unfortunately. |
@normandesjr Can you also change |
Done. |
I've just did an update with @cassioneri's patch, follow the new benchmark: |
|
So what would the behavior of I think we should not only limit the range, but also limit the intermediate values so that they don't exceed that range: it would be surprising to reject a value in |
TL;DR:
The proposal is to improve the performance of Go's time package using the recent algorithms developed by Neri and Schneider [1]. Their algorithms are faster than alternatives and they have been adopted by libstdc++ (the GNU implementation of the C++ Standard Library) [2], the Linux Kernel [3], .NET [4] and Firefox [5]. My benchmarks confirm performance improvements in Go as well. (See below.)
To achieve better performance we should change the supported range of dates: instead of supporting years taking any
int
value, they should be restricted to a much smaller range which still spans tens of thousands of years.Check the benchmark results that I obtained after implementing Neri-Schneider algorithms in Go source code.
Implementation:
Cassio Neri and I have implemented in Go source code the 32-bits versions of Neri-Schneider algorithms. They support dates from -1,468,000/Mar/01 to 1,471,745/Feb/28. Unfortunately, the current documentation of
doesn't explicitly state the range of supported dates but, since the argument
year
has typeint
, one might expect that any year frommath.MinInt
tomath.MaxInt
is allowed. This is further confirmed by the testTestZoneBounds
which usesmath.MinInt32
as a minimum year. Only this test fails when I replace the current implementations with Neri-Schneider algorithms.The main goal of this issue is to argue that the range supported by Neri-Schneider algorithms is more than enough for all practical applications (*). We also believe that the specification of the time package should explicitly state the range of supported dates. Moreover, this range should be smaller than the one supported by Neri-Schneider algorithms just in case faster but more restrictive algorithms are found in the future. Consider, for instance, the range specified by other popular programming languages:
C++ [6]:
from 1970/Jan/01 - 12,687,428 days to 1970/Jan/01 + 11,248,737 days, that is,
from -32,767/Jan/01 to 32,767/Dec/31.
C# [7]: from 0001/January/01 to 9999/Dec/31.
Ecma Script (a.k.a. JavaScript) [8]:
from 1970/Jan/01 - 100,000,000 days to 1970/Jan/01 + 100,000,000 days, that is,
from -271,821/Apr/20 to 275,760/Sep/13
We believe that any of those ranges are more than enough for the large majority of users and, concretely, suggest to set Go to use the same range as C++. (As mentioned above, the 32-bits implementations support a much larger range.) If you agree that it is fine, then I could submit the PR showing how it was implemented.
Follow the time's package benchmark. As you can see other functions got the benefits as well.
(*) It's worth mentioning that the Gregorian calendar is still not perfectly aligned to Earth's motions and a small error is accumulating. In a few thousands of years the error should be quite noticeable. Furthermore, Earth's rotation is slowing down which amplifies the problem [9]. The point here is that it's virtually impossible for the Gregorian calendar to remain in use in the next few thousands of years, let alone in billions of years.
References:
[1] Neri, C, Schneider, L. Euclidean affine functions and their application to calendar algorithms. Softw Pract Exper. 2023; 53(4): 937–970. doi:10.1002/spe.3172
https://onlinelibrary.wiley.com/doi/full/10.1002/spe.3172
See also Cassio Neri's presentation on calendar algorithms at C++Now 2023.
https://www.youtube.com/watch?v=0s9F4QWAl-E
[2] https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/std/chrono;h=10e868e5a036924177cf952a81f4bce10c497106;hb=HEAD#l1673
[3] https://github.com/torvalds/linux/blob/master/kernel/time/timeconv.c#L78 and https://github.com/torvalds/linux/blob/master/drivers/rtc/lib.c#L68
[4] https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/DateTime.cs#L1377
[5] https://searchfox.org/mozilla-central/source/js/src/jsdate.cpp#217
[6] https://eel.is/c++draft/time.cal.ymd#members-20
[7] https://learn.microsoft.com/en-us/dotnet/api/system.datetime.minvalue?view=net-7.0#remarks and https://learn.microsoft.com/en-us/dotnet/api/system.datetime.maxvalue?view=net-7.0#remarks
[8] https://262.ecma-international.org/14.0/#sec-time-values-and-time-range
[9] Duncan S. Marking Time: The Epic Quest to Invent the Perfect Calendar. 1st ed. John Wiley & Sons; 2000.
The text was updated successfully, but these errors were encountered: