-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math/bits: guarantee Add, Sub, Mul, RotateLeft, ReverseBytes to be constant-time operations #31267
Comments
RISC-V can generate the carry in constant time using a SLTU instruction. |
The change for #31229 results in a neglible (if any) impact on performance. Also keep in mind that the software fallback would rarely be executed. Here is the output for benchstat:
|
@smasher164 FYI, that benchmark looks to be testing the intrinsic path and not the pure Go fallback. When you test the pure Go version, you should run multiple counts to compare significant changes. If carries are unpredictable (adding random numbers), then the constant time (branchless) code will likely be faster than the prior implementation because the carry branches will be poorly predicted. |
It seems that Add64 is the most significantly impacted benchmark, while the others have a ~30% slowdown. |
I prefer making the default secure, and if necessary offer a |
If |
For someone doing a code review, it's reasonable to expect When there are a safe and an unsafe API, the unsafe one must be the qualified one. |
It is only true for crypto, constant time has no use for every other domain. By that perspective it's reasonable to expect that |
The choice of preserving semantics and timing by default should depend on the chance that the fallback is executed. For example, the FMA intrinsic proposed in #25819 has a decent chance of hitting the fallback on x86, arm, mips, and mips64. Additionally, the cost of using the FMA software fallback is orders of magnitude slower than a simple multiply and add. On the other hand, the chance of the bits.Add fallback being executed is much lower, given the add-with-carry instructions on most architectures. Additionally, the slowdown introduced by a constant time fallback is in the worst-case a factor of two. |
I find it appropriate that the FMA functions live in math, while Add, Sub and Mul live in math/bits. That allows us to build the expectation that math/bits is constant time unless the name is screaming otherwise, which is not true of math, or math/big. I am firmly convinced that even if safety is only relevant to one use case, the default should be safe, and an unsafe version can be added with a clear name. So if anyone thinks it's already time to add |
@FiloSottile operations in math/bits are not documented as being constant time so I did not have that expectation, though it makes sense now that you say it. Should it be mentioned in the package documentation? |
@smasher164 The chance of hitting a fallback might change over time, as hardware advances. I'm thinking about FMA the other way around: the compiler should do with @FiloSottile Obviously we don't agree on this one but I find you want to close it way too early, given how controversial the issue is (based on number of up- and down-votes). I would also find interesting what other core devs think about this. Regarding of whether to call something "constant time" or "variable time". I think the most common use case should take precedence in naming to the special case. For example, a "hash function" is usually called just that as evidenced on Wikipedia, whereas to signify the importance of other properties, one usually calls it "cryptographic hash function". Meaning So far we only discussed One of the goals of #21835 @robpike stated is to "make |
That chance is only going to go down. Once the intrinsic is implemented, it isn't going to be reverted (unless an instruction is removed from an instruction set, which never happens).
I see this as pretty much a non-issue. Practically speaking, no one is using fallbacks (and if they are, it isn't hard to contribute intrinsics). Processors today need to provide constant-time implementations of these operations for crypto libraries anyway, we might as well use them. And if you look at, say, the assembly code for math/big, all that assembly uses those constant-time implementations - there isn't a faster non-constant-time version that math/big feels like it needs to use. Sure, theoretically, a new faster non-constant-time multiply might show up in x86 next year, or some new popular processor architecture might show up that only has non-constant-time ops. But both of those seem to me like really unlikely events that we shouldn't design our systems around.
I agree that div is a special case, and we probably don't want to (can't?) guarantee constant time. |
Or, let the constant time operations live in |
"Security" is not a good enough reason to penalize real Go code with an 80% slowdown That said, the intention from the start was that you'd get intrinsics in any real Go code. The open question then is whether defining these to be constant time Thanks. |
I can confirm that all of our architectures have instructions that can be used for constant time add and sub (with the caveat that it's hard to tell whether x86 intrinsics do constant time mul. I don't know for sure about other architectures, but my suspicion is they do. Again, these are the same instructions as would be used in non-constant-time situations. The slowdown is a lot less than 80%. That number might be old, before some optimizations went in. Disabling intrinsics and comparing CL 170758 before and after, I get much more equivocal results:
|
Those benchmarks also favor the branching implementations, because the branch is perfectly predictable. In real world uses, the branch is perfectly random. |
@rsc Again, I am not arguing not to provide the fast option. My argument is that if we need to have two intrinsics, one constant time and one variable time, the default one should be the safe one, and the fast and unsafe one should be the qualified one, not the other way around. If you have a "slow" Add in a hot loop, the profiler will tell you, and you can go replace it with a faster unsafe version if appropriate. If you have a variable time Add in a crypto implementation, nothing will tell you. Anyway, this sounds moot because it looks like there are not in fact two classes of intrinsics. |
I think the point here is not just about The question is whether There are some options.
Obviously, 1 is not OK. So I think 3 is the acceptable option. Maybe a new package |
There are some discussions about how to test whether the code is constant-time. |
@rsc I double-checked with Agner Fog's "Instruction tables" and can confirm that there is no constant-time Div64 instruction in any of the recent microarchitectures by Intel or AMD, nor in ARM Cortex A57/A72 according to their "Software Optimization Guide", which includes the following:
|
One option would be to have a separate constant-time package. |
I feel like we are solving a problem that does not exist yet. It looks like all current architectures have fast, constant-time instructions for implementing intrinsics. That should make the fallbacks nearly irrelevant, so it shouldn't matter their performance. I would be more than happy with a package comment like the following
This avoids locking ourselves into an answer for now, so we can make a decision later with more information if something changes, instead of speculating. And in the meantime no cryptography implementor expects division to be constant time anyway. |
One either guarantees that a function or package is constant time, or one does not. "Future additions might" means that backward compatibility could be broken, which of course cannot be allowed. The very issue is about "locking ourselves into an answer" and a much better solution is to let crypto functions live in crypto package. |
As for constant-time cryptography code, I think it is much important to have the tool to test whether the code is constant-time. |
And this issue is an example which #17373 is trying to solve.
Because user-defined assembly functions can't be intrinsicified, the code must be placed in |
As far as I know, making a tool to ensure constant time code is an unsolved problem in cryptography engineering, at least in the general case. The attempts I know about are also dynamic, not static, so not really an option for cross-platform development. Also, the Go 1 Compatibility Promise applies to functions, not packages, so "Future additions might not be constant time." is not at all a violation, as long as the functions that exist stay constant time. |
I agree that it is not a violation. And compatibility promise means the functions must be constant-time forever. |
It doesn't sound like we should say "All but these" are constant time. Enumerate the ones we want to be constant-time, document each one as such, and then don't worry about having to say "Future ones may not fit the blanket rule." It doesn't seem like LeadingZeros64 is particularly easy to do in constant time, for example (absent hardware support). It would be fine to just start with Add, Sub, Mul (certainly not Div). Filippo, are there others that are important? |
Sounds good to me. (Including sized variants, of course.) We definitely use RotateLeft, too, and I can see a use for ReverseBytes. Their fallbacks are already constant time. |
OK, then let's start with Add, Sub, Mul, RotateLeft, ReverseBytes for now (including sized variants). |
I removed the Documentation label because there is code work to do too in the fallbacks. |
It seems that the ReverseBytes and RotateLeft fallbacks are already constant time. Should the change then modify Add and Sub, while documenting Add, Sub, Mul, RotateLeft, and ReverseBytes to be constant time? |
Change https://golang.org/cl/170758 mentions this issue: |
Ah, right, reopening to document the property. I'll send a CL tomorrow. |
Change https://golang.org/cl/178177 mentions this issue: |
I'm a heavy non-crypto user of
bits.*
and I'm afraid changes like #31229 will effect my performance. One way of keeping crypto people happy and users like me would be to have separate functions for both non-constant time and constant time addition, multiplication, ...In the issue above I was reminded that the change affects only the fallbacks. I'm afraid promising that the fallbacks are constant time will later lead to constant time native implementations which could very well be slower than their non-constant time alternatives. For example, platforms which don't support carry flag are RISC-V and WebAssembly and it could happened that "a branch works out better" [1].
[1] https://gmplib.org/manual/Assembly-Carry-Propagation.html
The text was updated successfully, but these errors were encountered: