New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math/bits: consider generating lookup len8tab with type int array instead of uint8 array #23919
Comments
/cc @griesemer |
That wastes 3 out of 4 bytes in the L1 cache (7 out of 8 on 64 bit platforms). |
I agree that increasing the lookup table by a factor of 8 does not seem worth the performance increase. Unfortunately our microbenchmarks do not measure performance when the cache is cold or the negative cache eviction effects on other part of the code. The effect is larger than i would have expected. Maybe its worth looking in the assembly if the difference is more than a MOVQ vs MOVBZX and if there are any load and loop alignment effects. |
I thought most of these were going to be intrinsified, in which case the could extra ns overhead shouldn't exist. |
I'm not sure optimizing nanoseconds of performance of the reference implementation is worth it. |
Thanks for the input y'all
@cznic wasteful in deed, and naive but it just struck me that we were making the int(uint8) in every spot. I hesitated to propose a "row compression" i.e still uses 256 bytes-ish, is a [64]int array. I hesitated because I thought perhaps it would be less readable and more complex. Thankfully this feedback gave me the license to post it so: var len8Rows [64]int
func init() {
for i, j := 0, 0; i < len(len8tabInt); {
r := len8tabInt[i : i+4]
len8Rows[j] = r[3]<<24 | r[2]<<16 | r[1]<<8 | r[0]
i, j = i+4, j+1
}
}
func len64Compressed(x uint64) (n int) {
if x >= 1<<32 {
x >>= 32
n = 32
}
if x >= 1<<16 {
x >>= 16
n += 16
}
if x >= 1<<8 {
x >>= 8
n += 8
}
return n + (0xFF & (len8Rows[x>>2] >> ((x & 0x03)<<3)))
} which I believe is cache friendlier, but the results from time/op aren't radically any better.
@martisch I'd love to learn how to do this. If one day you are bored and have time to kill, please feel free to work on this and ping me too, if it interests you.
@ericlagergren I thought so too
@rasky gotcha gotcha. I'll leave this open for three days just in case anyone has any thoughts or ideas, and then close it. UPDATE: I fixed the algorithm for correctness as my order of reading didn't match that of discovery in len8tab |
The choice of []uint8 (rather than []int) was deliberate; taking into the account both the savings in space and the extra cost of conversion (and mix of byte vs non-byte operations on x86 processors, which in the past cost pipeline stalls). Let's leave this as is. There's no reason to further micro-optimize the default implementation at a cost of more memory/cache use when there's optimized versions available when needed. @odeke-em I leave this to you to close. |
@griesemer thanks, agreed. I'll close it. As a thought experiment when you get bored or free, please feel free to take a look at my version in #23919 (comment) which is just as space and cache efficient. |
@odeke-em It sounds like your version is not radically better than what we have now; so I'd say let's stick with the more straight-forward (simpler) solution that we have now. Thanks. |
Cool, thanks for taking a look at it @griesemer, and everyone, it was a great discussion. |
I was just studying math/bits and noticed that the generated bits_table.go's len8tab is an uint8 array [256]uint8. However all its uses have an int conversion of the results i.e.
That is:
go/src/math/bits/bits.go
Line 290 in dfb0e4f
go/src/math/bits/bits.go
Line 299 in dfb0e4f
go/src/math/bits/bits.go
Line 312 in dfb0e4f
go/src/math/bits/bits.go
Line 329 in dfb0e4f
I believe that the usage provides the impetus for us to directly generate a [256]int array plus we can squeeze a few point nanoseconds of efficiency out
as per gist https://gist.github.com/odeke-em/b0cb6f29d04cdda934726e23b414694a
which I ran over 20 samples
The text was updated successfully, but these errors were encountered: