New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: timediv should use |= instead of += since we are adding bitsets to an originally cleared value #27529
Comments
Looks like |
If |
Thanks for the chime in @rasky! Interesting and cool that bits.Div* is being added. Perhaps I am misunderstanding how the new API will be used but I think it'll involve a bit of retrofitting given a few restrictions in the doc comments. I just skimmed real quick over it and thought to plugin and play one of the sanity checks in the runtime Line 208 in 22afb35
but the results are unexpected package main_test
import (
"math/bits"
"testing"
)
func TestDiv(t *testing.T) {
v := int64(12345*1000000000+54321)
div := int32(1000000000)
hi, lo := uint(v>>32), uint(v&0xFFFFFFFF)
quo, rem := bits.Div(hi, lo, uint(div))
if g, w := quo, uint(12345); g != w {
t.Errorf("Quotient:\n\tgot %d (0x%X)\n\twant %d (0x%X)", g, g, w, w)
}
if g, w := rem, uint(54321); g != w {
t.Errorf("Remainder :\n\tgot %d (0x%X)\n\twant %d (0x%X)", g, g, w, w)
}
} with results $ go test -run=TestDiv
--- FAIL: TestDiv (0.00s)
div_test.go:15: Quotient:
got 53015942467842 (0x3037BC6B1102)
want 12345 (0x3039)
div_test.go:18: Remainder :
got 515390001 (0x1EB83A31)
want 54321 (0xD431)
FAIL
exit status 1
FAIL _/Users/emmanuelodeke/Desktop/openSrc/bugs/golang/runtime-timediv 0.005s With my point being that: perhaps let's file a separate bug as a reminder to retrofit "math/bits".Div into the runtime once https://golang.org/cl/123157 is landed as you suggested. For now I think we can easily address using |
Change https://golang.org/cl/134231 mentions this issue: |
We should not use timediv on 64-bit machines. We should just use the division operator, as os_linux.go does. |
@odeke-em which arch are you testing that on? Try using Div32 instead of Div to make it behave the same on both 32-bit and 64-bit archs (your code would fail on 64 bit because of the right shift sign extending the 64 bit input). @cherrymui using bits.Div is a portable way of doing a 64 bit division without two different code paths |
@rasky I was on AMD64, Darwin. Thank you for the tip of switching to Div32, that runs correctly and is magnitudes faster than all the others in deed $ go test -v -run=^$ -bench=As
goos: darwin
goarch: amd64
BenchmarkAsPlus-8 50000000 27.8 ns/op
BenchmarkAsOr-8 50000000 26.0 ns/op
BenchmarkAsDiv-8 2000000000 0.39 ns/op
PASS
ok _/Users/emmanuelodeke/Desktop/openSrc/bugs/golang/runtime-timediv 3.568s |
I'm not sure we want the runtime package to import math/bits, although it is probably a possibility. And, with the current implementation in CL 123157, it probably doesn't work. math/bits.Div32 is implemented simply with 64-bit division. On 32-bit systems, this lowers to runtime.uint64div. As the comment in timediv says, this may not fit into nosplit stack limit (maybe it does now, I don't know, at least it didn't when timediv was introduced). And not all 32-bit machines have 64-bit-divided-by-32-bit intrinsic. Of course we could probably have a different implementation of Div32 that uses very small stack space. |
@odeke-em : That last benchmark result looks like the entire operation was optimized away. It's unlikely that the chip is doing a divide in less than a nanosecond. |
If 64-bit division now fits into nosplit stack limit on all machines, I suggest changing back to using the division operator. |
@randall77 aha, yeah it looked too good to be true, but alas seems to actually be running alright package main
import (
"math/bits"
"testing"
)
func BenchmarkAsDiv(b *testing.B) {
v := int64(12345*1000000000 + 54321)
div := int32(1000000000)
var quo, rem uint32
for i := 0; i < b.N; i++ {
hi, lo := uint32(v>>32), uint32(v&0xFFFFFFFF)
quo, rem = 0, 0
quo, rem = bits.Div32(hi, lo, uint32(div))
}
if g, w := quo, uint32(12345); g != w {
b.Fatalf("Got %d\nWant %d", g, w)
}
if g, w := rem, uint32(54321); g != w {
b.Fatalf("Got %d\nWant %d", g, w)
}
} still giving $ go test -run=$^ -bench=As
goos: darwin
goarch: amd64
BenchmarkAsPlus-8 50000000 24.5 ns/op
BenchmarkAsOr-8 100000000 22.8 ns/op
BenchmarkAsDiv-8 2000000000 0.33 ns/op
PASS
ok _/Users/emmanuelodeke/Desktop/openSrc/bugs/golang/runtime-timediv 4.252s $ uname -a
Darwin Emmanuels-MacBook-Pro-2.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 |
We'll have to figure this out, as there are many other points where runtime could benefit from
Right, I was implying that an intrinsification CL would eventually land (like CL129415, but for 32-bit archs).
Oh that's right, I guess ARM doesn't. |
@rasky would you mind perhaps filing a reminder issue to examine what we can do around the standard library with math/bits.Div* when the CL lands? I ask because that conversation dominated this issue(which is now closed), but requires more thoughts and investigation, but also the conversations are very valuable and would be nice to keep it going on a fresh issue. Thank you! |
Right now just before bed, I have been studying
runtime/os_darwin.go
in relation to https://go-review.googlesource.com/c/go/+/133655 to try to write a regression/end-to-end test after that CL is merged.I noticed that in the code for
timediv
go/src/runtime/runtime1.go
Lines 414 to 421 in 22afb35
we try to reduce the original timeunit by at most ((1<<31)-1) = 0x7FFFFFFF if (v >= conversionUnit<<bits)
where:
In particular notice that we are doing a
+=
for values that are essentially bit sets for a value that started at 0go/src/runtime/runtime1.go
Line 419 in 22afb35
i.e. we are only doing power of 2 increments/toggling bits.
We could replace that by
|=
and that code should be faster by a couple of nanoseconds and to justify the change:ns
so at most ((1<<31)-1) * 1e9 ns (since max epoch second = (1<<31)-1) which by any measure will mean that every single invocation oftimediv
with a system nanotime that began later than September 8th 2001 ~7PM, will run through that loop at least once -- from
ns
to at bests
, with smaller units it is even more guaranteed to run since (1e9<<30) (div=1e9, ns to s)I believe that we could shave off a couple of nanoseconds and take advantage of registers.
My simple & dirty benchmark at https://play.golang.org/p/tWy9l5oH0tG when run shows an 18% decrement in time
timediv
is in a very hot path throughout Go and the ecosystem; when we perform any semaphore operations e.g. while locking and unlocking a mutex on many OSes, of which Go's bread and butter for dealing with concurrent accesses are mutex operations and those are also used all around the standard library.I am filing this issue because the comment for that function says that is is a very special function
go/src/runtime/runtime1.go
Lines 410 to 411 in 22afb35
and I also wonder if perhaps there was an intention of using
+=
?Please feel free to correct me or provide some ideas, my apologies for any mistakes #~4AMChecks. Thank you!
The text was updated successfully, but these errors were encountered: