Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/bits: an integer bit twiddling library #18616

Closed
brtzsnr opened this issue Jan 11, 2017 · 168 comments
Closed

math/bits: an integer bit twiddling library #18616

brtzsnr opened this issue Jan 11, 2017 · 168 comments

Comments

@brtzsnr
Copy link
Contributor

brtzsnr commented Jan 11, 2017

Previous discussions at #17373 and #10757.

Abstract

This proposal introduces a set of API for integer bit twiddling.

Background

This proposal introduces a set of API for integer bit twiddling. For this proposal we are interested in the following functions:

  • ctz - count trailing zeros.
  • clz - count leading zeros; log_2.
  • popcnt - count population; hamming distance; integer parity.
  • bswap - reverses the order of bytes.

These functions were picked by surveying:

We limited ourselves to these four functions because other twiddling
tricks are very simple to implement using the proposed library,
or already available Go constructs.

We found implementations for a subset of the selected twiddling functions
in many packages including runtime, compiler and tools:

package clz ctz popcnt bswap
math/big X X
runtime/internal/sys X X X
tools/container/intsets X X
cmd/compile/internal/ssa X X
code.google.com/p/intmath X X
github.com/hideo55/go-popcount X (asm)
github.com/RoaringBitmap/roaring X X (asm)
github.com/tHinqa/bitset X X X
github.com/willf/bitset X X (asm)
gopl.io/ch2/popcount X
GCC builtins X X X X

Many other packages implement a subset of these functions:

Similarly hardware providers have recognized the importance
of such functions and included machine level support.
Without hardware support these operations are very expensive.

arch clz ctz popcnt bswap
AMD64 X X X X
ARM X X ? X
ARM64 X X ? X
S390X X X ? X

All bit twiddling functions, except popcnt, are already implemented by runtime/internal/sys and receive special support from the compiler in order to "to help get the very best performance". However, the compiler support is limited to the runtime package and other Golang users have to reimplement the slower variant of these functions.

Proposal

We introduce a new std library math/bits with the following external API, to provides compiler / hardware optimized implementations of clz, ctz, popcnt and bswap functions.

package bits

// SwapBytes16 reverses the order of bytes in a 16-bit integer.
func SwapBytes16(uint16) uint16
// SwapBytes32 reverses the order of bytes in a 32-bit integer.
func SwapBytes32(uint32) uint32
// SwapBytes64 reverses the order of bytes in a 64-bit integer.
func SwapBytes64(uint64) uint64

// TrailingZeros32 counts the number of trailing zeros in a 32-bit integer, and if all are zero, then 32.
func TrailingZeros32(uint32) uint
// TrailingZeros64 counts the number of trailing zeros in a 64-bit integer, and if all are zero, then 64.
func TrailingZeros64(uint64) uint

// LeadingZeros32 counts the number of trailing zeros in a 32-bit integer, and if all are zero, then 32.
func LeadingZeros32(uint32) uint
// LeadingZeros64 counts the number of trailing zeros in a 64-bit integer, and if all are zero, then 64.
func LeadingZeros64(uint64) uint

// Ones32 counts the number of bits set in a 32-bit integer.
func Ones32(uint32) uint
// Ones64 counts the number of bits set in a 64-bit integer.
func Ones64(uint64) uint

Rationale

Alternatives to this proposal are:

  • Bit twiddling functions are implemented in an external library not supported by the compiler. This approach works and is the current state of affairs. Runtime uses the compiler supported methods, while Golang users continue to use the slower implementations.
  • The external library is supported by the compiler. Given that we expect this library to replace runtime/internal/sys it means that this library must be in lock with the compiler and live in the standard library.

Compatibility

This proposal does not change or breaks any existing stdlib API and it conforms to compatibly guidelines.

Implementation

SwapBytes, TrailingZeros and LeadingZeros are already implemented. The only missing function is Ones which can be implemented similarly to the other functions. If this proposal is accepted it can be implemented in time for Go1.9.

Open issues (if applicable)

Names are hard, bike shed is in the comments.

Please suggest additional functions to be included in the comments. Ideally, please include where such function is used in stdlib (e.g. math/big), tools or popular packages.

So far the following functions have been proposed and are under consideration:

  • RotateLeft / RotateRight
    • Pros: &63 is no longer need to compile a rotate with non-const argument to a single instruction on x86, x86-64..
    • Cons: very short/simple to implement and inline.
    • Used: crypto/ uses the constant rotate which is handled properly by the compiler.
  • ReverseBits
    • Pros: ?
    • Cons: ?
    • Used: ?
  • Add/Sub with carry return
    • Pros: Expensive otherwise
    • Cons: ?
    • Used: math/big

History

14.Jan: Clarified the output of TrailingZeros and LeadingZeros when the argument is 0.
14.Jan: Renamed methods: CountTrailingZeros -> TrailingZeros, CountLeadingZeros -> LeadingZeros, CountOnes -> Ones.
13.Jan: Fixed architecture name.
11.Jan: Initial proposal opened to public.

@cespare
Copy link
Contributor

cespare commented Jan 11, 2017

@brtzsnr perhaps you should submit this document to the proposal repo as outlined in the proposal process steps?

Since it's already markdown following the template, it should be easy to copy-paste into a CL creating a file design/18616-bit-twiddling.md (or whatever).

@brtzsnr
Copy link
Contributor Author

brtzsnr commented Jan 11, 2017

@cespare from https://github.com/golang/proposal "if the author wants to write a design doc, then they can write one". It started as a design doc, if there is strong feeling that I should submit this, I'm totally fine.

@griesemer
Copy link
Contributor

I'd be ok with this, it's common enough functionality used in many algorithmic libraries and math/bits seems like an appropriate place.

(For one, math/big also implements nlz (== clz).)

There's probably some bike shedding about the names. I for one would prefer the functions to say what they return rather than what they do; which in turn may lead to shorter names. For instance:

bits.TrailingZeros64(x) rather than bits.CountTrailingZeros64(x)

and so forth.

@griesemer
Copy link
Contributor

griesemer commented Jan 11, 2017

The proposal seems pretty clear and minimal - a design doc seems overkill. I think a CL would be more appropriate at this point.

(That is a CL with API and basic implementation - for purpose of discussion in place of a design doc. We still need to decide if this proposal should be accepted or not.)

@cespare
Copy link
Contributor

cespare commented Jan 11, 2017

@brtzsnr has already written the design document: it's in the issue description and it follows the the template. I assumed that there was some value in having these documents all in one location.

@cespare
Copy link
Contributor

cespare commented Jan 11, 2017

The last arch listed in the hardware support table is "BSWAP" -- typo?

@josharian
Copy link
Contributor

Thanks for writing this up.

The doc string for ctz and clz should specify the result when passed 0.

I also prefer (e.g.) TrailingZeros32 to CountTrailingZeros32. I'd also be happy with Ctz32. It is concise, familiar to most, and easily googleable for the rest.

@cherrymui
Copy link
Member

Thanks for the proposal.
I just want to add that we probably also want to focus on the use, instead of focusing only on the instructions. For example, @dr2chase once found there are several log2 functions in the compiler/assembler. It is close to CLZ but not the same. Functions like this should probably also in the bit twiddling package. And maybe also include useful functions that are not related to these instructions.

@bradfitz bradfitz added this to the Proposal milestone Jan 13, 2017
@minux
Copy link
Member

minux commented Jan 13, 2017 via email

@dsnet
Copy link
Member

dsnet commented Jan 13, 2017

@minux, fortunately, every bit twiddling function I've needed so far is exactly the ones that are in this proposal.

@dr2chase
Copy link
Contributor

Following Hacker's Delight has the advantage that we don't need to waste time arguing about the names.

@minux
Copy link
Member

minux commented Jan 13, 2017 via email

@josharian
Copy link
Contributor

Relatedly, functions to check whether an add/multiply will overflow.

@thepudds
Copy link
Contributor

A related data point: there are various issues that have been filed for faster cgo. In one such example (proposal #16051), the fact that fast implementation of bsr/ctz/etc. might happen was mentioned as hopefully chipping away at the set of use cases where people writing go are tempted to use cgo.

@aclements commented on Jul 1, 2016:

@eloff, there's been some discussion (though no concrete proposals to my knowledge) to add functions for things like popcount and bsr that would be compiled as intrinsics where supported (like math.Sqrt is today). With 1.7 we're trying this out in the runtime, which now has a ctz intrinsic available to it on amd64 SSA. Obviously that doesn't solve the overall problem, but it would chip away at one more reason to use cgo in a low-overhead context.

Many people (myself included) are attracted to go because of the performance, so things like this current proposal for bit twiddling would help. (And yes, cgo is also now faster in 1.8, which is nice as well).

@brtzsnr
Copy link
Contributor Author

brtzsnr commented Jan 14, 2017

RotateLeft/Right (can be expanded inline with two shifts, but the compiler
can't always do the transformation due to shift range issues)

@minux Can you elaborate what is the problem?

@randall77
Copy link
Contributor

@brtzsnr : I think what Minux is referring to is that when you write (x << k) | (x >> (64-k)), you know you're using 0 <= k < 64, but the compiler can't read your mind, and it is not obviously derivable from the code. If we had the function

func leftRot(x uint64, k uint) uint64 {
   k &= 63
   return (x << k) | (x >> (64-k))
}

Then we can make sure (via the &63) that the compiler knows that the range of k is bounded.
So if the compiler can't prove the input is bounded, then we need an extra AND. That's better than not generating the rotate assembly at all.

@minux
Copy link
Member

minux commented Jan 14, 2017 via email

@ghost
Copy link

ghost commented Jan 14, 2017

How about byte and bit shuffling (and unshuffling) functions that are used by the blosc compression library? The slides (the shuffling starts from the slide 17). These functions can be SSE2/AVX2 accelerated.

@minux
Copy link
Member

minux commented Jan 14, 2017 via email

@brtzsnr
Copy link
Contributor Author

brtzsnr commented Jan 14, 2017

The current proposed functions have a Go native implementation much larger and disproportionally more expensive than optimum. On the other hand rotate is easy to write inline in a way that the compiler can recognize.

@minux and also everyone else: Do you know where rotate left/right is used with a non-constant number of rotated bits? crypto/sha256 uses rotate for example, but with constant number of bits.

@josharian
Copy link
Contributor

rotate is easy to write inline in a way that the compiler can recognize.

It is easy for those who are familiar with the compiler's internals. Putting it in a math/bits package makes it easy for everyone.

Do you know where rotate left/right is used with a non-constant number of rotated bits?

Here's an example from #9337:

https://play.golang.org/p/rmDG7MR5F9

In each invocation it is a constant number of rotated bits each time, but the function itself is not currently inlined, so it compiles without any rotate instructions. A math/bits library function would definitely help here.

@minux
Copy link
Member

minux commented Jan 14, 2017 via email

@brtzsnr
Copy link
Contributor Author

brtzsnr commented Jan 17, 2017

@josharian The example looks like a bad inliner decision if rot is not inlined. Did you try to write the function as func rot(x, n) instead of rot = func(x, n)?

@minux: I agree with you. I'm not trying to tie the API to a particular instruction set; the hardware support is a nice bonus. My main focus is to find usages in the real code (not toy code) to understand the context, what is the best signature and how important is to provide everyone's favorite function. Compatibility promise will bite us later if we don't this properly now.

For example: What should be the signature of add with carry return? Add(x, y uint64) (c, s uint64)? Looking at math/big we probably need Add(x, y uintptr) (c, s uintptr) too.

@josharian
Copy link
Contributor

The example looks like a bad inliner decision if rot is not inlined.

Yes. It is part of a bug complaining about inlining. :)

Did you try to write the function as func rot(x, n) instead of rot = func(x, n)?

It's not my code--and that's part of the point. And anyway, it is reasonable code.

@mundaym
Copy link
Member

mundaym commented Jan 19, 2017

It would be nice to guarantee (in the package documentation?) that the rotate and byte-swap functions are constant-time operations so that they can be safely used in crypto algorithms. Possibly something to think about for other functions too.

@minux
Copy link
Member

minux commented Jan 19, 2017 via email

@randall77
Copy link
Contributor

I'm inclined to agree with @minux. If you want constant-time crypto primitives, they should live in crypto/subtle. crypto/subtle can easily redirect to math/bits on platforms where those implementations have been verified. They can do something else if a slower but constant-time implementation is required.

@rsc rsc changed the title proposal: an integer bit twiddling library proposal: math/bits: an integer bit twiddling library Jan 23, 2017
@nathany
Copy link
Contributor

nathany commented Apr 4, 2017

@cznic bits are written right-to-left though

@griesemer
Copy link
Contributor

griesemer commented Apr 4, 2017

I'm also in favor of just Rotate (#18616 (comment)) if we make it work in both directions.

As a counter-argument to @aclements concern about the direction: Providing a RotateLeft that works also when rotating right can also provide a false sense of security: "It says RotateLeft so surely it's not rotating right!". In other words, if it says RotateLeft it better doesn't do anything else.

Also, the use of bits.Rotate is really in specialist code only. This is not a function that is used by a large number of programmers. Users that really need it will understand the symmetry of rotations.

@cznic
Copy link
Contributor

cznic commented Apr 4, 2017

@nathany

bits are written right-to-left though

Bits are just binary digits. Digits in number in any base are written left to right - even in most, if not all, right-to-left writing systems. 123 is one-hundred-and-twenty-three, not three-hundred-and-twenty-one.

That the power of the multiplicand for the digit decreases to the right is a different thing.

Once again: I don't care about the direction, it's just that the intuitive one is a matter of personal imagination.

@rogpeppe
Copy link
Contributor

rogpeppe commented Apr 4, 2017 via email

@rsc
Copy link
Contributor

rsc commented Apr 5, 2017

Please keep both RotateLeft and RotateRight instead of doing something half of developers will misremember. It seems fine to handle negative numbers though.

@robpike
Copy link
Contributor

robpike commented Apr 5, 2017

99% of developers will never program a rotate instruction, so the need for unambiguous direction is weak at best. A single call is enough.

The problem that reawakened this discussion is that having both requires arguing about whether negative values are OK, and if not, what to do about them. By having only one, that whole argument falls away. It's cleaner design.

@ct1n
Copy link
Contributor

ct1n commented Apr 5, 2017

I somewhat sympathize with the argument about clean design but it seems weird that you have to remove "Right" from "RotateRight", while keeping the same implementation, in order to achieve it. In practical terms the only way it seems to answer questions is by forcing people who see it to read the documentation, by way of the questions the name raises.
In the end it's a matter of unambiguous direction vs unambiguous behavior for negative values. Negative values should probably be less of a concern in the common case.

What I'm saying is that Rotate raises one question for all people, answering it indirectly through documentation.
RotateRight raises one question for very few people who can (and should) read the documentation if they care about it.

On the other hand Rotate would probably prevent people from writing if n < 0 { RotateLeft(...) } else { RotateRight(...) }.

@rsc
Copy link
Contributor

rsc commented Apr 10, 2017

@golang/proposal-review discussed this and ended up at having just one function, but naming it RotateLeft, not just Rotate, to be extra clear at call sites. Negative numbers rotate right, and documentation will make this clear.

gopherbot pushed a commit that referenced this issue Apr 11, 2017
For details see the discussion on the issue below.

RotateLeft functions can now be inlined because the don't panic
anymore for negative rotation counts.

name            old time/op  new time/op  delta
RotateLeft-8    6.72ns ± 2%  1.86ns ± 0%  -72.33%  (p=0.016 n=5+4)
RotateLeft8-8   4.41ns ± 2%  1.67ns ± 1%  -62.15%  (p=0.008 n=5+5)
RotateLeft16-8  4.46ns ± 6%  1.65ns ± 0%  -63.06%  (p=0.008 n=5+5)
RotateLeft32-8  4.50ns ± 5%  1.67ns ± 1%  -62.86%  (p=0.008 n=5+5)
RotateLeft64-8  4.54ns ± 1%  1.85ns ± 1%  -59.32%  (p=0.008 n=5+5)

https://perf.golang.org/search?q=upload:20170411.4

(Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3.)

For #18616.

Change-Id: I0828d80d54ec24f8d44954a57b3d6aeedb69c686
Reviewed-on: https://go-review.googlesource.com/40394
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
lparth pushed a commit to lparth/go that referenced this issue Apr 13, 2017
Popcount instructions on amd64 are not guaranteed to be
present, so we must guard their call.  Rewrite rules can't
generate control flow at the moment, so the intrinsifier
needs to generate that code.

name           old time/op  new time/op  delta
OnesCount-8    2.47ns ± 5%  1.04ns ± 2%  -57.70%  (p=0.000 n=10+10)
OnesCount16-8  1.05ns ± 1%  0.78ns ± 0%  -25.56%    (p=0.000 n=9+8)
OnesCount32-8  1.63ns ± 5%  1.04ns ± 2%  -35.96%  (p=0.000 n=10+10)
OnesCount64-8  2.45ns ± 0%  1.04ns ± 1%  -57.55%   (p=0.000 n=6+10)

Update golang#18616

Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673
Reviewed-on: https://go-review.googlesource.com/38320
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
lparth pushed a commit to lparth/go that referenced this issue Apr 13, 2017
For details see the discussion on the issue below.

RotateLeft functions can now be inlined because the don't panic
anymore for negative rotation counts.

name            old time/op  new time/op  delta
RotateLeft-8    6.72ns ± 2%  1.86ns ± 0%  -72.33%  (p=0.016 n=5+4)
RotateLeft8-8   4.41ns ± 2%  1.67ns ± 1%  -62.15%  (p=0.008 n=5+5)
RotateLeft16-8  4.46ns ± 6%  1.65ns ± 0%  -63.06%  (p=0.008 n=5+5)
RotateLeft32-8  4.50ns ± 5%  1.67ns ± 1%  -62.86%  (p=0.008 n=5+5)
RotateLeft64-8  4.54ns ± 1%  1.85ns ± 1%  -59.32%  (p=0.008 n=5+5)

https://perf.golang.org/search?q=upload:20170411.4

(Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3.)

For golang#18616.

Change-Id: I0828d80d54ec24f8d44954a57b3d6aeedb69c686
Reviewed-on: https://go-review.googlesource.com/40394
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

CL https://golang.org/cl/40394 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/41630 mentions this issue.

@griesemer
Copy link
Contributor

The original proposal plus a few extra functions have been designed and implemented at this point. We may add to this library over time, but it seems reasonably "complete" for now.

Most notably, we have not decided upon or implemented functionality to:

  • test whether add/mul/etc overflow
  • provide functions to implement add/mul/etc that accept an incoming carry and produce a result plus carry (or higher word)

Personally I'm not convinced those belong into a "bits" package (maybe the tests do). Functions to implement multi-precision add/sub/mul would allow a pure Go implementation of some of the math/big kernels, but I don't believe the granularity is right: What we want there is optimized kernels working on vectors, and maximum performance for those kernels. I don't believe we can achieve that with Go code depending on add/sub/mul "intrinsics" alone.

Thus, for now I like to close this issue as "done" unless there are major objections. Please speak up over the next week or so if you are against closing this.

@jimmyfrasche
Copy link
Member

I'm in favor of adding functions along those lines.

I strongly believe that they belong in their own package, if for no other reason than to give it a name that better reflects their collective functionality.

👍 on closing this issue and ❤️ for the work done so far.

@griesemer
Copy link
Contributor

Closing since there were no objections.

gopherbot pushed a commit that referenced this issue May 10, 2017
This adds math/bits intrinsics for OnesCount, Len, TrailingZeros on
ppc64x.

benchmark                       old ns/op     new ns/op     delta
BenchmarkLeadingZeros-16        4.26          1.71          -59.86%
BenchmarkLeadingZeros16-16      3.04          1.83          -39.80%
BenchmarkLeadingZeros32-16      3.31          1.82          -45.02%
BenchmarkLeadingZeros64-16      3.69          1.71          -53.66%
BenchmarkTrailingZeros-16       2.55          1.62          -36.47%
BenchmarkTrailingZeros32-16     2.55          1.77          -30.59%
BenchmarkTrailingZeros64-16     2.78          1.62          -41.73%
BenchmarkOnesCount-16           3.19          0.93          -70.85%
BenchmarkOnesCount32-16         2.55          1.18          -53.73%
BenchmarkOnesCount64-16         3.22          0.93          -71.12%

Update #18616

I also made a change to bits_test.go because when debugging some failures
the output was not quite providing the right argument information.

Change-Id: Ia58d31d1777cf4582a4505f85b11a1202ca07d3e
Reviewed-on: https://go-review.googlesource.com/41630
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
@dskinner
Copy link
Member

This is a comment for future decisions regarding API, I understand this particular one is set.

Rotate is a specialist function; LTR or RTL is only relevant given the context. @aclements brought up a valid question, not a valid ever-extending point. His question could have been answered as "it's RTL, just as integers increment"; easy, right?

But instead, cleverness ensues.

What "a specialist function" means is such a simple thing, it was likely dismissed to quickly. Given a code sample, one likely already understands the rotate to occur and the direction even before encountering the line of code. Such code is usually already preceded by illustrative ascii documentation as it is.

What's mentally turbulent is not that Go could have simply chosen RTL as a standard way of interpreting bits from an API perspective, but rather, I first pulled up the changes of 1.9 and find a RotateLeft with no counterpart and the doc giving an example of a negative stride. This is a mind-numbing committee-like decision that is very unfortunate to be landing in 1.9.

I only plead to stick to context of usage for the future. All of this should have been self-evident with questions like, "why are we not providing a counterpart to RotateLeft, why are we panic'ing on negative strides or debating int vs uint for a stride"; ultimately, because I think what "a specialist function" means was simply dismissed to easily for not being clever.

Let us please avoid cleverness in our justification of APIs. It shows in this 1.9 update.

@gopherbot
Copy link

Change https://golang.org/cl/90835 mentions this issue: cmd/compile: arm64 intrinsics for math/bits.OnesCount

gopherbot pushed a commit that referenced this issue Feb 15, 2018
This adds math/bits intrinsics for OnesCount on arm64.

name         old time/op  new time/op  delta
OnesCount    3.81ns ± 0%  1.60ns ± 0%  -57.96%  (p=0.000 n=7+8)
OnesCount8   1.60ns ± 0%  1.60ns ± 0%     ~     (all equal)
OnesCount16  2.41ns ± 0%  1.60ns ± 0%  -33.61%  (p=0.000 n=8+8)
OnesCount32  4.17ns ± 0%  1.60ns ± 0%  -61.58%  (p=0.000 n=8+8)
OnesCount64  3.80ns ± 0%  1.60ns ± 0%  -57.84%  (p=0.000 n=8+8)

Update #18616

Conflicts:
	src/cmd/compile/internal/gc/asm_test.go

Change-Id: I63ac2f63acafdb1f60656ab8a56be0b326eec5cb
Reviewed-on: https://go-review.googlesource.com/90835
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
@golang golang locked and limited conversation to collaborators Jan 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests