Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: time: improve performance of calendar algorithms and clarify its specification #63844

Open
normandesjr opened this issue Oct 31, 2023 · 19 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance Proposal
Milestone

Comments

@normandesjr
Copy link

TL;DR:

The proposal is to improve the performance of Go's time package using the recent algorithms developed by Neri and Schneider [1]. Their algorithms are faster than alternatives and they have been adopted by libstdc++ (the GNU implementation of the C++ Standard Library) [2], the Linux Kernel [3], .NET [4] and Firefox [5]. My benchmarks confirm performance improvements in Go as well. (See below.)

To achieve better performance we should change the supported range of dates: instead of supporting years taking any int value, they should be restricted to a much smaller range which still spans tens of thousands of years.

Check the benchmark results that I obtained after implementing Neri-Schneider algorithms in Go source code.

only-date

Implementation:

Cassio Neri and I have implemented in Go source code the 32-bits versions of Neri-Schneider algorithms. They support dates from -1,468,000/Mar/01 to 1,471,745/Feb/28. Unfortunately, the current documentation of

`func Date(year int, month Month, day, hour, min, sec, nsec int, loc *Location) Time`

doesn't explicitly state the range of supported dates but, since the argument year has type int, one might expect that any year from math.MinInt to math.MaxInt is allowed. This is further confirmed by the test TestZoneBounds which uses math.MinInt32 as a minimum year. Only this test fails when I replace the current implementations with Neri-Schneider algorithms.

The main goal of this issue is to argue that the range supported by Neri-Schneider algorithms is more than enough for all practical applications (*). We also believe that the specification of the time package should explicitly state the range of supported dates. Moreover, this range should be smaller than the one supported by Neri-Schneider algorithms just in case faster but more restrictive algorithms are found in the future. Consider, for instance, the range specified by other popular programming languages:

C++ [6]:
from 1970/Jan/01 - 12,687,428 days to 1970/Jan/01 + 11,248,737 days, that is,
from -32,767/Jan/01 to 32,767/Dec/31.

C# [7]: from 0001/January/01 to 9999/Dec/31.

Ecma Script (a.k.a. JavaScript) [8]:
from 1970/Jan/01 - 100,000,000 days to 1970/Jan/01 + 100,000,000 days, that is,
from -271,821/Apr/20 to 275,760/Sep/13

We believe that any of those ranges are more than enough for the large majority of users and, concretely, suggest to set Go to use the same range as C++. (As mentioned above, the 32-bits implementations support a much larger range.) If you agree that it is fine, then I could submit the PR showing how it was implemented.

Follow the time's package benchmark. As you can see other functions got the benefits as well.

package-time

(*) It's worth mentioning that the Gregorian calendar is still not perfectly aligned to Earth's motions and a small error is accumulating. In a few thousands of years the error should be quite noticeable. Furthermore, Earth's rotation is slowing down which amplifies the problem [9]. The point here is that it's virtually impossible for the Gregorian calendar to remain in use in the next few thousands of years, let alone in billions of years.

References:

[1] Neri, C, Schneider, L. Euclidean affine functions and their application to calendar algorithms. Softw Pract Exper. 2023; 53(4): 937–970. doi:10.1002/spe.3172
https://onlinelibrary.wiley.com/doi/full/10.1002/spe.3172

See also Cassio Neri's presentation on calendar algorithms at C++Now 2023.
https://www.youtube.com/watch?v=0s9F4QWAl-E

[2] https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/std/chrono;h=10e868e5a036924177cf952a81f4bce10c497106;hb=HEAD#l1673

[3] https://github.com/torvalds/linux/blob/master/kernel/time/timeconv.c#L78 and https://github.com/torvalds/linux/blob/master/drivers/rtc/lib.c#L68

[4] https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/DateTime.cs#L1377

[5] https://searchfox.org/mozilla-central/source/js/src/jsdate.cpp#217

[6] https://eel.is/c++draft/time.cal.ymd#members-20

[7] https://learn.microsoft.com/en-us/dotnet/api/system.datetime.minvalue?view=net-7.0#remarks and https://learn.microsoft.com/en-us/dotnet/api/system.datetime.maxvalue?view=net-7.0#remarks

[8] https://262.ecma-international.org/14.0/#sec-time-values-and-time-range

[9] Duncan S. Marking Time: The Epic Quest to Invent the Perfect Calendar. 1st ed. John Wiley & Sons; 2000.

@mauri870 mauri870 changed the title Improve performance of calendar algorithms in the time package and clarify its specification. time: improve performance of calendar algorithms and clarify its specification Oct 31, 2023
@mauri870
Copy link
Member

kindly CC @rsc (per https://dev.golang.org/owners)

@seankhliao seankhliao added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels Oct 31, 2023
@mauri870 mauri870 changed the title time: improve performance of calendar algorithms and clarify its specification proposal: time: improve performance of calendar algorithms and clarify its specification Nov 1, 2023
@gopherbot gopherbot added this to the Proposal milestone Nov 1, 2023
@mauri870 mauri870 added Proposal and removed Proposal labels Nov 1, 2023
@AlexanderYastrebov
Copy link
Contributor

AlexanderYastrebov commented Nov 14, 2023

We also believe that the specification of the time package should explicitly state the range of supported dates.

It should be possible to split codepath into slow and fast versions depending on the year value without defining any new constraints on the year value, shouldn't it?

@normandesjr
Copy link
Author

We also believe that the specification of the time package should explicitly state the range of supported dates.

It should be possible to split codepath into slow and fast versions depending on the year value without defining any new constraints on the yer value, shouldn't it?

Thanks @AlexanderYastrebov . It's possible, but I'm not sure if it's suitable. I've changed some functions, like absDate, daysSinceEpoch and Date, and we would add some complexity (and another branch) for a range of date that is unusal and that other languages already don't support. IMHO we should even add some asserts and panic if the year is outside the "new" specification. Make sense?

@unixdj
Copy link

unixdj commented Nov 25, 2023

@normandesjr

IMHO we should even add some asserts and panic if the year is outside the "new" specification. Make sense?

Panicking is a bit extreme, innit?

If I understand Cassio Neri's example C++ code correctly, he restricts values only for the duration of the calculation. Something like this (but also handling negative years):

	const (
		CycleLength     = SomeNumberIDK
		DaysPer400Years = 146097
		YearCycle       = 400 * CycleLength
		DayCycle        = DaysPer400Years * CycleLength
	)
	cycles, year := year / YearCycle, year % YearCycle
	days := CalculateDate(year, month, day)
	days += cycles * DayCycle

I would propose limiting timestamps to those that can be represented by int64 (i.e., handling overflow), or close to it (capping the year value to prevent overflow), and using the mechanism above to deal with calculations whose intermediate values may overflow.

Furthermore, Earth's rotation is slowing down which amplifies the problem

UTC is not the only kind of UT, so this shouldn't prevent me from printing the time when atoms started forming with nanosecond precision in my local timezone, so what if my city or my galaxy doesn't exist yet ;)

@cassioneri
Copy link

I wish to share my experience after having studied the Gregorian calendar and implemented myself support for it in the Linux Kernel, libstdc++, Firefox (for its Javascript engine) and also assisted others to implement them for .NET and Java (ongoing) official standard libraries and also a Rust user library.

The range of supported dates must be very clearly specified by the language as it is a contract which users can rely on. This is the first piece of information that I searched when I implemented these algorithms for any language's standard library.

What should be a reasonable range of dates selected by the language? There's no absolute answer for this question and @normandesjr's proposal cites a few choices taken by different languages. Pragmatism is required to select one since complications come from historical, geographical and astronomical aspects. To know more about the historical and geographical points, please, see my talk at C++ Now 2023 (from 27min7s) but let me elaborate on the astronomical point raised by @normandesjr and commented by @unixdj. This question is not about UTC and time zones.

Astronomers calculate that in the time of the dinosaurs, during a year Earth would spin about 400 times and not the 365 or so that it does nowadays. In other words, years were 400 days or so long. If there were people and computers around at that time they would most certainly not be using a calendar remotely similar to the Gregorian and thus for us to talk about 1st January year -65,000,000 is a bit of nonsense. Similarly, in a few million years the number of days in a year is expected to be far less than 365. I doubt the Gregorian calendar will still be used at this point. We don't need to go that far and even if Earth's movements were steady, the fact that the ratio between the duration of a day and the duration of a year is an irrational number means that the Gregorian calendar will be out of sync with Solstices and Equinoxes in a couple of thousands of years which again raises the question whether the Gregorian calendar will be used, unchanged, at this point.

In light of the above, should Golang commit itself to support a range of Gregorian dates spanning billions of years? IMHO, it should restrict the range to a more pragmatic one that suits well and efficiently the large majority of users now and in many generations to come but no more. A more restricted range allows more algorithms to be used and the best performing one (known or to be discovered) can be implemented. The point is that current users should not pay the price of using less performant algorithms that are destined to become obsolete far in the future just to support the current calendar far beyond its existence.

@normandesjr's proposal suggests picking the same range as C++ (from year -32767 to year 32767), which I believe is a very sensible choice, but anything not much larger than a few thousands of years is more than enough. Notice that changing the internal representation of time and its resolution is not proposed. Only conversion of internal representation from and to calendar dates (year, month, day) are touched.

Another advantage of a more restrictive range is that it allows a unit test to run through all possible dates and check the result. I've done that for libstdc++ and the Linux Kernel and these tests run under a second.

Finally, panicking if the input is out of range is useful. For Firefox, I added just a comment stating support for the range required by the JavaScript specification (Ecma-262). The thoughtful reviewer, Andre Bargul (towards whom I'm very grateful) asked me to assert that condition, which I did. Then a CI test crashed Firefox on this assertion revealing that there's was a bug elsewhere. He then fixed the bug for the benefit of all users.

I hope this helps. Obviously I'd be glad to see my own algorithms being used in Golang but the question of precision of the specification is, in my opinion, more important than the choice of algorithm. @normandesjr's proposal tackles this point. It's a very good proposal.

@unixdj
Copy link

unixdj commented Nov 28, 2023

@cassioneri

The range of supported dates must be very clearly specified by the language

Definitely.

I'm inclined to make this range as broad as possible, because if you have a time.Time value, ideally you should be able to display it. Perhaps rather than restricting the range of supported days, a better idea would be to restrict the range of supported (or even "legal") Time values.

Also, see below regarding API compatibility.

This question is not about UTC and time zones.

Sure, I was not entirely serious about local time. And I was mistaken about UT, I meant TAI (International Atomic Time).

Loved your talk, BTW.

If there were people and computers around at that time they would most certainly not be using a calendar remotely similar to the Gregorian and thus for us to talk about 1st January year -65,000,000 is a bit of nonsense.

I would argue that perpetual Gregorian calendar is a fiction anyway, and astrophysicists talking about "13 billion years ago" is also a bit of nonsense, as you can't have a year without a planet.

Unless you define a year as 31,556,952 SI seconds, then it's fine I guess.

picking the same range as C++ (from year -32767 to year 32767), which I believe is a very sensible choice

The year range that tzcode (zoneinfo's code used in glibc and libc in many other Unix systems) supports is -2,147,481,748 to 2,147,485,547, which is the full range of int32_t whose 0 value means year 1900.

Finally, panicking if the input is out of range is useful.

Here I disagree. The Go standard library generally panics only on application programmer errors or if the internal state goes haywire. E.g., the time package panics on non-initialised Timer or Ticker values, non-positive intervals and nil Location values (application bugs), and on conditions indicating a bug in the standard library.

Time values may come from origins such as user input and database entries, and there may well be code out there that operates on values far away from now (e.g., for carbon dating or astrophysics), formatting the time as fmt.Sprintf(-t.Year() / 1000000, "million years ago"). Because if Go supports timestamps of almost 300 billion years from now, why roll your own?

Besides, introducing panics into date calculation code would break the Go 1 API compatibility promise. Which is another reason to support as wide range as possible.

@cassioneri
Copy link

@unixdj

[...] there may well be code out there that operates on values far away from now (e.g., for carbon dating or astrophysics), formatting the time as fmt.Sprintf(-t.Year() / 1000000, "million years ago").

Indeed, astrophysicists might be interested in billions of years. But this is a niche domain with clever people that are very capable of rolling their own algorithms.

Because if Go supports timestamps of almost 300 billion years from now, why roll your own?

For two reasons.

  1. Supporting billions of years rules out algorithms that require fewer instructions (save memory, especially in instruction cache), are faster and spend less energy. Hence, if Golang support billions of years, every user will bear the costs for, presumably, the benefit of a few specialists in niche domains.

  2. The clever people from these niche domains might not be interested in Time.Year() after all! Indeed, Time.Year() calculates the "year" that most of us consider in everyday live, i.e., year in the Gregorian calendar. (As you said, the "perpetual Gregorian calendar is a fiction anyway", so who uses it for calculations of billions of years?) Astronomers talk about sideral year, solar year, tropical year, vernal year, etc. They have their own time keeping system called Julian Day Number (JD) invented by Joseph Scaliger in 1583 and based on the average Julian calendar year of 365.25 days. Here is a quote from the International Astronomical Union (IAU) [1]:

Although there are several different kinds of year, the IAU regards a year as a Julian year of 365.25 days (31.5576 million seconds).

Hence, even if Time.Year() supports billion of years, for astronomers, it produces the wrong result and they won't use it.
Furthermore, from the number of SI seconds since the JD epoch, they can get the number of years by a simple division by 31,557,600. In summary, for astronomers Time.Year() is wrong and slower than what they can implement.

[1] https://www.iau.org/public/themes/measuring/

@unixdj
Copy link

unixdj commented Nov 29, 2023

@cassioneri

Although there are several different kinds of year, the IAU regards a year as a Julian year of 365.25 days (31.5576 million seconds).

Interesting, I didn't know this.

But this is a niche domain with clever people that are very capable of rolling their own algorithms.

True. But I suspect that in many cases time.Time is probably involved at some point, which is what I meant by "why not" (it's there, might as well use it). I suspect that people working, say, with dates in Ancient Rome might wrap Time and override some methods for converting dates to use old Roman calendars before a certain date and call standard methods after, and astronomers might display the time as Julian years but convert it using Time.Unix for storage. Go enables this approach quite well, and I've done it with Time (for other reasons).

In any case, the Go core team will likely insist that the Go 1 compatibility promise is not broken.

Look, I understand your point. You never need to display the time of the formation of a galaxy as -11000000000-08-06 11:23:05.123456789 +0200 CEST. And I see that your algorithms are so fast that doing div-mod of the time by 10,000 or 100,000 years (perhaps conditionally after a comparison) will make calculations significantly slower.

Not the whole range works now, mind you. The time pkg doesn't convert dates before year -292277022399 of the perpetual Gregorian calendar, some 250 years after 1970-01-01 - 1<<63 seconds, correctly, and I expect weirdness on the other end too.

But I'm just not sure how much you can further restrict the range without breaking the compatibility promise that the Go team takes quite seriously.

And I hope you can see why panicking is not a good solution for Go, or at least for Go 1. I'm pretty sure the Go team would prefer to cap the value, return sentinel values or even document the limitation and return garbage.

Speaking of which, wouldn't a check like

if uint64(t) >= yearOneMillion {
	cycle, t = t / millionYears, t % millionYears
}
y, m, d := calculate(t)
y += cycle * 1000000

be almost as fast as this, for times within the range?

if t < lowLimit || t > highLimit {
	panic("what are you, an immortal?")
}
y, m, d := calculate(t)

@AlexanderYastrebov
Copy link
Contributor

As I wrote in #63844 (comment) it should be possible to have two implementations conditioned on the year value. The existing implementation would need to stay anyways to verify correctness of the new code. Once proof of the pudding exists it might be easier to establish new API constraints.

@AlexanderYastrebov
Copy link
Contributor

There is also a related #56909 that reports year overflow above/below certain values

@normandesjr
Copy link
Author

I've given an idea about panic, it was like a suggestion, the main idea here is to have the great benefit of a date algorithm faster (more than 25%) for a range that everybody needs.

normandesjr added a commit to normandesjr/go that referenced this issue Dec 7, 2023
…ification

This implementation improves the performance of Go's time package using the recent algorithms developed by Neri and Schneider [1]. Their algorithms are faster than alternatives, my benchmarks confirm performance improvements in Go as well.

```
goos: linux
goarch: amd64
pkg: time
cpu: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
                       | /tmp/master.txt |   /tmp/nerishcneider.txt
                       |     sec/op      |   sec/op    vs base
Now-8                        69.38n ± 0%   69.33n ± 0%  -0.07% (p=0.001 n=20)
NowUnixNano-8                70.35n ± 0%   70.08n ± 0%  -0.38% (p=0.000 n=20)
NowUnixMilli-8               71.47n ± 0%   71.26n ± 0%  -0.30% (p=0.000 n=20)
NowUnixMicro-8               71.32n ± 0%   71.21n ± 0%  -0.15% (p=0.000 n=20)
Format-8                     441.9n ± 0%   433.8n ± 0%  -1.83% (p=0.000 n=20)
FormatRFC3339-8              293.9n ± 0%   283.4n ± 0%  -3.57% (p=0.000 n=20)
FormatRFC3339Nano-8          297.5n ± 0%   286.9n ± 0%  -3.55% (p=0.000 n=20)
FormatNow-8                  242.4n ± 0%   239.7n ± 0%  -1.11% (p=0.000 n=20)
MarshalJSON-8                144.0n ± 0%   136.2n ± 0%  -5.42% (p=0.000 n=20)
MarshalText-8                142.0n ± 0%   136.2n ± 0%  -4.02% (p=0.000 n=20)
Parse-8                      241.2n ± 0%   240.9n ± 0%  -0.12% (p=0.002 n=20)
ParseRFC3339UTC-8            78.52n ± 0%   73.12n ± 0%  -6.87% (p=0.000 n=20)
ParseRFC3339UTCBytes-8       86.60n ± 0%   81.05n ± 0%  -6.42% (p=0.000 n=20)
ParseRFC3339TZ-8             291.5n ± 0%   283.1n ± 0%  -2.90% (p=0.000 n=20)
ParseRFC3339TZBytes-8        343.2n ± 0%   333.2n ± 0%  -2.93% (p=0.000 n=20)
ParseDuration-8              125.6n ± 0%   125.8n ± 0%  +0.16% (p=0.002 n=20)
Hour-8                       6.458n ± 0%   6.457n ± 0%       ~ (p=0.130 n=20)
Second-8                     6.460n ± 0%   6.456n ± 0%  -0.05% (p=0.002 n=20)
Year-8                       18.68n ± 0%   14.10n ± 0% -24.52% (p=0.000 n=20)
Day-8                        24.01n ± 0%   17.14n ± 0% -28.61% (p=0.000 n=20)
ISOWeek-8                    27.55n ± 0%   24.46n ± 0% -11.22% (p=0.000 n=20)
GoString-8                   157.9n ± 0%   152.9n ± 0%  -3.13% (p=0.000 n=20)
UnmarshalText-8              290.1n ± 0%   283.4n ± 0%  -2.31% (p=0.000 n=20)
Geomean                      93.84n ± 0%   89.08n ± 0%  -5.07%
```

To achieve better performance we should change the supported range of dates: instead of supporting years taking any int value, they should be restricted to a much smaller range which still spans tens of thousands of years.

[1] Neri, C, Schneider, L. Euclidean affine functions and their application to calendar algorithms. Softw Pract Exper. 2023; 53(4): 937–970. doi:10.1002/spe.3172
https://onlinelibrary.wiley.com/doi/full/10.1002/spe.3172

For: golang#63844
@gopherbot
Copy link

Change https://go.dev/cl/548155 mentions this issue: time: improve performance of calendar algorithms and clarify its spec…

@cassioneri
Copy link

I believe there's a misunderstanding about the scope of the proposal. Let me clarify.

It does not propose limiting all functionalities of time to smaller ranges. It covers only functions based on the Gregorian calendar, namely, Date, Year, Month, Day, ISOWeek, YearDay and AddDate.

Functions that only need the number of ticks from epoch remain the same. Going back to the astronomers example, they can get IAU year by calling t.Unix() and dividing the result by 31,557,600. The proposal would not change anything on this perspective and billions of IAU years would still be supported in this way.

@AlexanderYastrebov: There's no need to add an if. The 64-bits version of our algorithms can support billions of Gregorian years. This is more effective than adding the if but still more costly than using the 32-bits version (which supports "only" millions of years.) I'd prefer avoiding extra costs altogether since supporting billions of Gregorian years, as I'm trying to say, is meaningless.

As we can see from comments on other issues, as far as time is concerned, Go is very vague on its promises. People have revoursed to reverse engineering or to look very deeply in the sources to find out supported ranges. IMHO, the current situation is quite messy and fragile. I think this proposal is a step in the right direction to fix many issues. It sets out a reasonable range of dates that is enough for the vast majority, if not all, real users and brings clarity, speed and energy savings for them.

@unixdj
Copy link

unixdj commented Dec 9, 2023

The 64-bits version of our algorithms can support billions of Gregorian years.

I took @normandesjr's code from https://github.com/normandesjr/go/blob/feat/date-algorithm/src/time/time.go and converted absDate and Date to use uint64. It works beautifully. Check this out (I used it outside the Go source tree, to avoid stashing my changes):

https://gist.github.com/unixdj/e31f75d5b49b5a2b3b4fbbae69368937

This is more effective than adding the if but still more costly than using the 32-bits version (which supports "only" millions of years.)

How much overhead does it add?

FWIW, a quick benchmark shows that my version of absDate runs faster. I looked at the assembly and eliminated one addition, but I don't speak amd64. First column: original, second: after changing the beginning of absDate as below, third: uint64, times for 100M runs.

        const (
                s = 3670
                K = 719468 + 146097*s
                L = 400 * s
        )
        N := uint32(abs/secondsPerDay - uint64(unixToInternal+internalToAbsolute)/secondsPerDay + K)
743.328392ms    708.183141ms    644.01476ms
718.081683ms    691.610367ms    653.935377ms
738.729335ms    701.506694ms    656.245927ms
736.034086ms    702.445939ms    641.974361ms
725.429312ms    688.144645ms    639.662721ms
722.263895ms    773.823083ms    657.371755ms
719.608441ms    696.451653ms    649.812959ms
715.551736ms    692.676515ms    647.024967ms

As we can see from comments on other issues, as far as time is concerned, Go is very vague on its promises.

True, unfortunately.

@unixdj
Copy link

unixdj commented Dec 9, 2023

@normandesjr Can you also change isLeap to @cassioneri's algorithm?

@normandesjr
Copy link
Author

@normandesjr Can you also change isLeap to @cassioneri's algorithm?

Done.

@normandesjr
Copy link
Author

I've just did an update with @cassioneri's patch, follow the new benchmark:

benchmar-2023-12-10

@unixdj
Copy link

unixdj commented Dec 10, 2023

Now takes more time now? That's weird.

@bcmills
Copy link
Contributor

bcmills commented Feb 23, 2024

It does not propose limiting all functionalities of time to smaller ranges. It covers only functions based on the Gregorian calendar, namely, Date, Year, Month, Day, ISOWeek, YearDay and AddDate.

So what would the behavior of Date be for a Time value produced by calling Add with a positive duration on the maximum supported time.Time?

I think we should not only limit the range, but also limit the intermediate values so that they don't exceed that range: it would be surprising to reject a value in time.Date only to have exactly that same value returned from a call to the Date method on a time.Time variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance Proposal
Projects
Status: Incoming
Development

No branches or pull requests

8 participants