math/bits: guarantee Add, Sub, Mul, RotateLeft, ReverseBytes to be constant-time operations #31267

TuomLarsen · 2019-04-05T03:53:38Z

I'm a heavy non-crypto user of bits.* and I'm afraid changes like #31229 will effect my performance. One way of keeping crypto people happy and users like me would be to have separate functions for both non-constant time and constant time addition, multiplication, ...

In the issue above I was reminded that the change affects only the fallbacks. I'm afraid promising that the fallbacks are constant time will later lead to constant time native implementations which could very well be slower than their non-constant time alternatives. For example, platforms which don't support carry flag are RISC-V and WebAssembly and it could happened that "a branch works out better" [1].

[1] https://gmplib.org/manual/Assembly-Carry-Propagation.html

The text was updated successfully, but these errors were encountered:

randall77 · 2019-04-05T05:52:47Z

RISC-V can generate the carry in constant time using a SLTU instruction.
Web assembly has a similar instruction. I don't know whether it guarantees constant time or not.

smasher164 · 2019-04-05T16:51:19Z

The change for #31229 results in a neglible (if any) impact on performance. Also keep in mind that the software fallback would rarely be executed. Here is the output for benchstat:

name             old time/op  new time/op  delta
Add-4            1.44ns ± 0%  1.63ns ± 0%   ~     (p=1.000 n=1+1)
Add32-4          1.08ns ± 0%  1.05ns ± 0%   ~     (p=1.000 n=1+1)
Add64-4          1.46ns ± 0%  1.43ns ± 0%   ~     (p=1.000 n=1+1)
Add64multiple-4  0.73ns ± 0%  0.99ns ± 0%   ~     (p=1.000 n=1+1)
Sub-4            1.42ns ± 0%  1.42ns ± 0%   ~     (all equal)
Sub32-4          1.08ns ± 0%  1.47ns ± 0%   ~     (p=1.000 n=1+1)
Sub64-4          1.42ns ± 0%  1.47ns ± 0%   ~     (p=1.000 n=1+1)
Sub64multiple-4  0.72ns ± 0%  0.72ns ± 0%   ~     (p=1.000 n=1+1)

bmkessler · 2019-04-05T18:16:23Z

@smasher164 FYI, that benchmark looks to be testing the intrinsic path and not the pure Go fallback. When you test the pure Go version, you should run multiple counts to compare significant changes.

If carries are unpredictable (adding random numbers), then the constant time (branchless) code will likely be faster than the prior implementation because the carry branches will be poorly predicted.

smasher164 · 2019-04-05T18:41:41Z

name             old time/op  new time/op  delta
Add-4            1.16ns ±11%  1.51ns ± 5%  +30.52%  (p=0.008 n=5+5)
Add32-4          1.08ns ± 0%  1.03ns ± 1%   -4.86%  (p=0.029 n=4+4)
Add64-4          1.09ns ± 1%  1.95ns ± 3%  +79.23%  (p=0.008 n=5+5)
Add64multiple-4  4.03ns ± 1%  4.55ns ±11%  +13.07%  (p=0.008 n=5+5)
Sub-4            1.08ns ± 1%  1.50ns ± 0%  +38.17%  (p=0.016 n=5+4)
Sub32-4          1.09ns ± 2%  1.53ns ±10%  +40.26%  (p=0.008 n=5+5)
Sub64-4          1.10ns ± 1%  1.47ns ± 1%  +33.39%  (p=0.008 n=5+5)
Sub64multiple-4  4.30ns ± 2%  4.08ns ± 4%   -5.07%  (p=0.032 n=5+5)

It seems that Add64 is the most significantly impacted benchmark, while the others have a ~30% slowdown.

FiloSottile · 2019-04-08T19:59:02Z

I prefer making the default secure, and if necessary offer a VariableTimeAdd which communicates the risk. This also has the advantage of letting us cross that bridge when and if we come to it. (That is, if it ever turns out a native implementation would actually be faster if not constant time.)

TuomLarsen · 2019-04-08T20:14:43Z

If Add64 is implemented as ADC on platforms which have such instruction, there is nothing "variable" in it. I think "constant" communicates better that it is always a constant time operation, no matter if ADC or a fallback is used.

FiloSottile · 2019-04-08T20:36:57Z

For someone doing a code review, it's reasonable to expect bits.Add to be constant time, and it requires prior knowledge to know that no, you should use bits.ConstantTimeAdd instead. Making bits.Add the constant time one solves this problem. No one will be upset to find out that in fact VariableTimeAdd is constant time sometimes, the other way around is not true.

When there are a safe and an unsafe API, the unsafe one must be the qualified one.

TuomLarsen · 2019-04-08T20:51:33Z

It is only true for crypto, constant time has no use for every other domain. By that perspective it's reasonable to expect that bits.Add adds two numbers, constant time or not.

smasher164 · 2019-04-08T21:32:46Z

The choice of preserving semantics and timing by default should depend on the chance that the fallback is executed. For example, the FMA intrinsic proposed in #25819 has a decent chance of hitting the fallback on x86, arm, mips, and mips64. Additionally, the cost of using the FMA software fallback is orders of magnitude slower than a simple multiply and add. On the other hand, the chance of the bits.Add fallback being executed is much lower, given the add-with-carry instructions on most architectures. Additionally, the slowdown introduced by a constant time fallback is in the worst-case a factor of two.

FiloSottile · 2019-04-09T17:21:29Z

I find it appropriate that the FMA functions live in math, while Add, Sub and Mul live in math/bits.

That allows us to build the expectation that math/bits is constant time unless the name is screaming otherwise, which is not true of math, or math/big.

I am firmly convinced that even if safety is only relevant to one use case, the default should be safe, and an unsafe version can be added with a clear name. So if anyone thinks it's already time to add VariableTimeAdd, VariableTimeSub and VariableTimeMul, we can make this proposal about that. Otherwise, I think we should close this for now.

jimmyfrasche · 2019-04-09T17:55:01Z

@FiloSottile operations in math/bits are not documented as being constant time so I did not have that expectation, though it makes sense now that you say it. Should it be mentioned in the package documentation?

TuomLarsen · 2019-04-09T18:10:04Z

@smasher164 The chance of hitting a fallback might change over time, as hardware advances. I'm thinking about FMA the other way around: the compiler should do with a*b+c whatever it pleases but when explicitly asking for FMA it should do just one rounding. Meaning Add should do whatever is needed to make it fast, and ConstantTimeAdd would guarantee what it says.

@FiloSottile Obviously we don't agree on this one but I find you want to close it way too early, given how controversial the issue is (based on number of up- and down-votes). I would also find interesting what other core devs think about this.

Regarding of whether to call something "constant time" or "variable time". I think the most common use case should take precedence in naming to the special case. For example, a "hash function" is usually called just that as evidenced on Wikipedia, whereas to signify the importance of other properties, one usually calls it "cryptographic hash function". Meaning Add should do whatever compiler pleases to make it performant and only "crypto" Add should guarantee constant time operation.

So far we only discussed Add and Mul but there is also Div. At least on z86 DIVQ instruction has variable latency so I would assume it always needs to call a fallback, in order to preserve constant time performance.

One of the goals of #21835 @robpike stated is to "make math/bits support its operations more efficiently through the compiler". This will make it less performant, not more. In the same thread, @josharian implemented PCG using math/bits and there is no need for constant time Add as the generator is not intended for crypto in the first place.

randall77 · 2019-04-09T19:34:44Z

The chance of hitting a fallback might change over time, as hardware advances.

That chance is only going to go down. Once the intrinsic is implemented, it isn't going to be reverted (unless an instruction is removed from an instruction set, which never happens).

I would also find interesting what other core devs think about this.

I see this as pretty much a non-issue. Practically speaking, no one is using fallbacks (and if they are, it isn't hard to contribute intrinsics). Processors today need to provide constant-time implementations of these operations for crypto libraries anyway, we might as well use them. And if you look at, say, the assembly code for math/big, all that assembly uses those constant-time implementations - there isn't a faster non-constant-time version that math/big feels like it needs to use.

Sure, theoretically, a new faster non-constant-time multiply might show up in x86 next year, or some new popular processor architecture might show up that only has non-constant-time ops. But both of those seem to me like really unlikely events that we shouldn't design our systems around.

So far we only discussed Add and Mul but there is also Div. At least on z86 DIVQ instruction has variable latency so I would assume it always needs to call a fallback, in order to preserve constant time performance.

I agree that div is a special case, and we probably don't want to (can't?) guarantee constant time.

TuomLarsen · 2019-04-09T21:36:02Z

Or, let the constant time operations live in crypto. There is already hash/fnv and there is crypto/sha1, math/rand and crypto/rand. And of course crypto/subtle, with ConstantTimeCompare, ConstantTimeCopy, ConstantTimeLessOrEq, ... There could also very well be math/bits and crypto/bits as they similarly serve different purposes. Or they could share the space in subtle with other ContantTime* functions.

rsc · 2019-04-10T17:44:13Z

"Security" is not a good enough reason to penalize real Go code with an 80% slowdown
(as in the Add64 in the CL).

That said, the intention from the start was that you'd get intrinsics in any real Go code.
Any time the fallback is too slow the answer is certainly "get intrinsics added on that architecture."
Slowing down the fallback by 80% to match the constant-time semantics of the intrinsics
is OK. But should we define that the intrinsics are constant-time?

The open question then is whether defining these to be constant time
would fundamentally slow down any intrinsic implementation (perhaps one
yet to be written). It doesn't seem like it should.
Can someone confirm that intrinsics would not be affected at all
(even on other systems yet to be implemented) if we change the
definition of math/bits to be constant-time for these operations?

Thanks.

randall77 · 2019-04-10T18:34:27Z

I can confirm that all of our architectures have instructions that can be used for constant time add and sub (with the caveat that it's hard to tell whether wasm ops are really constant time). Those are the same instructions as would be used for cases that didn't need constant time (in math/big, say).

x86 intrinsics do constant time mul. I don't know for sure about other architectures, but my suspicion is they do. Again, these are the same instructions as would be used in non-constant-time situations.
(If mul wasn't constant time - what would crypto libraries use?)

The slowdown is a lot less than 80%. That number might be old, before some optimizations went in. Disabling intrinsics and comparing CL 170758 before and after, I get much more equivocal results:

name             old time/op  new time/op  delta
Add-8            1.04ns ± 1%  1.30ns ± 1%  +25.79%         (p=0.000 n=9+10)
Add32-8          1.04ns ± 0%  0.79ns ± 1%  -24.44%          (p=0.000 n=7+9)
Add64-8          1.04ns ± 1%  1.31ns ± 0%  +25.48%         (p=0.000 n=10+8)
Add64multiple-8  4.52ns ± 2%  2.88ns ± 1%  -36.24%         (p=0.000 n=10+8)
Sub-8            1.04ns ± 2%  1.30ns ± 1%  +24.23%        (p=0.000 n=10+10)
Sub32-8          1.04ns ± 0%  1.04ns ± 0%     ~     (all samples are equal)
Sub64-8          1.05ns ± 0%  1.04ns ± 0%   -0.95%          (p=0.006 n=7+9)
Sub64multiple-8  4.29ns ± 3%  2.85ns ± 0%  -33.50%          (p=0.000 n=9+8)

randall77 · 2019-04-10T18:35:30Z

Those benchmarks also favor the branching implementations, because the branch is perfectly predictable. In real world uses, the branch is perfectly random.

FiloSottile · 2019-04-11T15:21:11Z

"Security" is not a good enough reason to penalize real Go code with an 80% slowdown (as in the Add64 in the CL).

@rsc Again, I am not arguing not to provide the fast option. My argument is that if we need to have two intrinsics, one constant time and one variable time, the default one should be the safe one, and the fast and unsafe one should be the qualified one, not the other way around.

If you have a "slow" Add in a hot loop, the profiler will tell you, and you can go replace it with a faster unsafe version if appropriate. If you have a variable time Add in a crypto implementation, nothing will tell you.

Anyway, this sounds moot because it looks like there are not in fact two classes of intrinsics.

crvv · 2019-04-11T16:50:52Z

I think the point here is not just about Add, Mul or Div.
I guess no one is worry about the speed of the constant-time Add or Mul. And Div is not the only one that the constant-time algorithm is not trivial.

The question is whether math/bits should be guaranteed to be constant-time.

There are some options.

All functions in math/bits are constant-time. The downside is that some functions like Div will be very slow. This is the point of the issue.
Some functions in math/bits are constant-time and some are not. It will be documented.
No function in math/bits is guaranteed to be constant-time.
Add two div functions, Div and ConstantTimeDiv. And there is only one Add.
Add two div functions, VariableTimeDiv and Div. And there is only one Add.
Add two functions for all operations in math/bits. There will be Add and ConstantTimeAdd.
Add two functions for all operations in math/bits. There will be VariableTimeAdd and Add.

Obviously, 1 is not OK.
2 and 4 are not safe.
For 5, almost everyone will use bits.VariableTimeDiv, which is kind of weird. And I need to remember that there is bits.VariableTimeDiv but there isn't bits.VaraibelTimeAdd.
I think 6 and 7 are worse than two packages, maybe math/bits and crypto/bits.

So I think 3 is the acceptable option. Maybe a new package crypto/bits will be added later.

crvv · 2019-04-11T17:06:01Z

If you have a variable time Add in a crypto implementation, nothing will tell you.

There are some discussions about how to test whether the code is constant-time.
#20654 (comment)

TuomLarsen · 2019-04-11T19:39:24Z

@rsc I double-checked with Agner Fog's "Instruction tables" and can confirm that there is no constant-time Div64 instruction in any of the recent microarchitectures by Intel or AMD, nor in ARM Cortex A57/A72 according to their "Software Optimization Guide", which includes the following:

Integer divides are performed using a iterative algorithm and block any subsequent divide operations until complete. Early termination is possible, depending upon the data values.

robpike · 2019-04-13T10:34:44Z

One option would be to have a separate constant-time package.

FiloSottile · 2019-04-16T03:00:12Z

I feel like we are solving a problem that does not exist yet. It looks like all current architectures have fast, constant-time instructions for implementing intrinsics. That should make the fallbacks nearly irrelevant, so it shouldn't matter their performance.

I would be more than happy with a package comment like the following

// All operations except Div are implemented in constant time.
// Future additions might not be constant time.

This avoids locking ourselves into an answer for now, so we can make a decision later with more information if something changes, instead of speculating. And in the meantime no cryptography implementor expects division to be constant time anyway.

TuomLarsen · 2019-04-16T07:09:34Z

One either guarantees that a function or package is constant time, or one does not. "Future additions might" means that backward compatibility could be broken, which of course cannot be allowed. The very issue is about "locking ourselves into an answer" and a much better solution is to let crypto functions live in crypto package.

crvv · 2019-04-18T02:53:21Z

As for constant-time cryptography code, I think it is much important to have the tool to test whether the code is constant-time.
With that tool, anyone can use any library to write the code and verify that the code is constant-time.
Without that tool, a constant-time math/bits isn't very helpful because it is still very difficult to verify the code for the entire encryption is constant-time.

crvv · 2019-04-18T03:07:25Z

And this issue is an example which #17373 is trying to solve.

the Go 1 compatibility makes designing good abstraction pretty hard, and
because the nature of this problem, we can't design the API in a package in
a subrepo (e.g. x/simd) like context or http2 and later move it into std
when it's mature.

Because user-defined assembly functions can't be intrinsicified, the code must be placed in stdlib.
Then it is controversial because it becomes a compatibility issue.

FiloSottile · 2019-04-18T03:10:48Z

As far as I know, making a tool to ensure constant time code is an unsolved problem in cryptography engineering, at least in the general case. The attempts I know about are also dynamic, not static, so not really an option for cross-platform development.

Also, the Go 1 Compatibility Promise applies to functions, not packages, so "Future additions might not be constant time." is not at all a violation, as long as the functions that exist stay constant time.

crvv · 2019-04-18T03:33:36Z

I agree that it is not a violation.
But I don't think it is a good idea to have "Future additions might not be constant time" in the document.

And compatibility promise means the functions must be constant-time forever.
If someone comes up with a faster LeadingZeros algorithm, the problem becomes
Is that acceptable to have a new faster VariableTimeLeadingZeros?

rsc · 2019-04-24T18:16:04Z

It doesn't sound like we should say "All but these" are constant time. Enumerate the ones we want to be constant-time, document each one as such, and then don't worry about having to say "Future ones may not fit the blanket rule." It doesn't seem like LeadingZeros64 is particularly easy to do in constant time, for example (absent hardware support).

It would be fine to just start with Add, Sub, Mul (certainly not Div). Filippo, are there others that are important?

FiloSottile · 2019-04-24T18:23:12Z

Sounds good to me. (Including sized variants, of course.)

We definitely use RotateLeft, too, and I can see a use for ReverseBytes. Their fallbacks are already constant time.

rsc · 2019-04-25T14:43:28Z

OK, then let's start with Add, Sub, Mul, RotateLeft, ReverseBytes for now (including sized variants).
Accepted.

rsc · 2019-04-25T14:44:57Z

I removed the Documentation label because there is code work to do too in the fallbacks.
I milestoned it Go 1.14, but if it goes into Go 1.13 that's fine too.

smasher164 · 2019-05-19T05:17:48Z

It seems that the ReverseBytes and RotateLeft fallbacks are already constant time. Should the change then modify Add and Sub, while documenting Add, Sub, Mul, RotateLeft, and ReverseBytes to be constant time?

gopherbot · 2019-05-20T00:58:47Z

Change https://golang.org/cl/170758 mentions this issue: math/bits: make Add and Sub fallbacks constant time

FiloSottile · 2019-05-20T01:19:15Z

Ah, right, reopening to document the property. I'll send a CL tomorrow.

gopherbot · 2019-05-20T19:27:45Z

Change https://golang.org/cl/178177 mentions this issue: math/bits: document that Add, Sub, Mul, RotateLeft, ReverseBytes are constant time

gopherbot added this to the Proposal milestone Apr 5, 2019

gopherbot added the Proposal label Apr 5, 2019

TuomLarsen changed the title ~~proposal: bits.ContantTimeAdd, bits.ContantTimeMul~~ proposal: bits.ConstantTimeAdd, bits.ConstantTimeMul Apr 5, 2019

rsc mentioned this issue Apr 10, 2019

math/bits: Add, Mul and Sub are not constant time #31229

Closed

bcmills added the NeedsDecision label Apr 11, 2019

Yawning mentioned this issue Apr 17, 2019

Change the code when the math/bits dust settles oasisprotocol/ed25519#1

Closed

rsc changed the title ~~proposal: bits.ConstantTimeAdd, bits.ConstantTimeMul~~ proposal: math/bits: document Add, Sub, Mul as constant-time operations Apr 24, 2019

gopherbot added the Documentation label Apr 24, 2019

rsc changed the title ~~proposal: math/bits: document Add, Sub, Mul as constant-time operations~~ math/bits: document Add, Sub, Mul, RotateLeft, ReverseBytes as constant-time operations Apr 25, 2019

rsc changed the title ~~math/bits: document Add, Sub, Mul, RotateLeft, ReverseBytes as constant-time operations~~ math/bits: guarantee Add, Sub, Mul, RotateLeft, ReverseBytes to be constant-time operations Apr 25, 2019

rsc added NeedsFix and removed Documentation labels Apr 25, 2019

rsc modified the milestones: Proposal, Go1.14 Apr 25, 2019

gopherbot removed the NeedsDecision label Apr 25, 2019

gopherbot closed this as completed in 5ca44dc May 20, 2019

FiloSottile reopened this May 20, 2019

FiloSottile modified the milestones: Go1.14, Go1.13 May 20, 2019

FiloSottile added Documentation Proposal-Accepted labels May 20, 2019

gopherbot closed this as completed in 41329c0 May 21, 2019

golang locked and limited conversation to collaborators May 20, 2020

gopherbot added the FrozenDueToAge label May 20, 2020

math/bits: guarantee Add, Sub, Mul, RotateLeft, ReverseBytes to be constant-time operations #31267

math/bits: guarantee Add, Sub, Mul, RotateLeft, ReverseBytes to be constant-time operations #31267

Comments

TuomLarsen commented Apr 5, 2019

randall77 commented Apr 5, 2019

smasher164 commented Apr 5, 2019 • edited Loading

bmkessler commented Apr 5, 2019

smasher164 commented Apr 5, 2019 • edited Loading

FiloSottile commented Apr 8, 2019

TuomLarsen commented Apr 8, 2019

FiloSottile commented Apr 8, 2019

TuomLarsen commented Apr 8, 2019

smasher164 commented Apr 8, 2019

FiloSottile commented Apr 9, 2019 • edited Loading

jimmyfrasche commented Apr 9, 2019

TuomLarsen commented Apr 9, 2019

randall77 commented Apr 9, 2019

TuomLarsen commented Apr 9, 2019

rsc commented Apr 10, 2019

randall77 commented Apr 10, 2019

randall77 commented Apr 10, 2019

FiloSottile commented Apr 11, 2019

crvv commented Apr 11, 2019

crvv commented Apr 11, 2019

TuomLarsen commented Apr 11, 2019

robpike commented Apr 13, 2019

FiloSottile commented Apr 16, 2019

TuomLarsen commented Apr 16, 2019

crvv commented Apr 18, 2019

crvv commented Apr 18, 2019

FiloSottile commented Apr 18, 2019

crvv commented Apr 18, 2019

rsc commented Apr 24, 2019

FiloSottile commented Apr 24, 2019

rsc commented Apr 25, 2019

rsc commented Apr 25, 2019

smasher164 commented May 19, 2019

gopherbot commented May 20, 2019

FiloSottile commented May 20, 2019

gopherbot commented May 20, 2019

smasher164 commented Apr 5, 2019 •

edited

Loading

smasher164 commented Apr 5, 2019 •

edited

Loading

FiloSottile commented Apr 9, 2019 •

edited

Loading