Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: math/big: support for constant-time arithmetic #20654

Closed
bford opened this issue Jun 13, 2017 · 65 comments
Closed

proposal: math/big: support for constant-time arithmetic #20654

bford opened this issue Jun 13, 2017 · 65 comments
Labels
FrozenDueToAge Proposal Proposal-Crypto Proposal related to crypto packages or other security issues
Milestone

Comments

@bford
Copy link
Contributor

bford commented Jun 13, 2017

Problem: Constant-Time Arithmetic for Cryptographic Uses

The math/big package naturally and inevitably gets used for cryptographic purposes, including in the standard Go crypto libraries. However, this usage is currently unsafe because math/big does not support constant-time operation and thus may well be leaking secret keys and other sensitive information via timing channels. This is a well-known problem already documented in math/big's godoc documentation.

A much more specific issue related to this was raised in 2011 (#2445) but eventually closed for lack of attention for too long.

See the preliminary companion patch 45490 presenting a first-cut at an implementation of this proposal: https://go-review.googlesource.com/c/45490/ But the most important details and considerations are discussed here.

Alternative Approaches to Hardening Go's Cryptographic Code

There are several potential ways to address this weakness in Go's cryptographic packages, such as:

  • Change all crypto code to use different big-number-arithmetic libraries that do support constant-time operation. But I'm not aware of any such general-purpose constant-time math library currently in existence for Go, and creating one will likely entail significant work, which as far as I'm aware no one has done yet (or is working on?).

  • Change all crypto code to use hand-coded constant-time implementations of arithmetic operations specialized to particular groups, elliptic curves, etc., like the crypto/elliptic package currently does specifically for the P256 curve (but not P224, P384, or P521). This approach may yield maximum performance, but again takes considerable effort, and lacks development scalability because this effort is specific to each particular curve or cryptographic scheme of interest. Perhaps most importantly, this effort has not actually been invested - i.e., it's one of those things "someone really should do" but no one actually does - thus leaving only the "most popular" curve (P256) with an arguably secure implementation and most of the rest of Go's cryptographic code in a decidedly risky state, a potential security vulnerability minefield.

  • Change math/big to support constant-time operation within its general-purpose arithmetic framework. This of course is not completely trivial either, but upon some examination does not appear to be either inordinately difficult or particularly invasive. So exploring that option is the purpose of this issue.

Proposed API Modifications for Constant-Time Support

The essence of this proposal, from an API perspective, is simply to add one new property, represented by a getter-setter pair, to the big.Int type, to enable and configure constant-time operation. The key requirement this API must address is that constant-time arithmetic inherently requires the math library to know the maximum possible/allowed size of the result to be produced in a given context, instead of just using the "smallest" internal buffer that can hold the particular numeric value of the result.

A completely tentative signature for the API addition is follows:

func (x *Int) BitCap() int
func (x *Int) SetBitCap(cap int) *Int

A caller invokes SetBitCap on an Int instance to configure that instance to request constant-time operation for subsequent operations invoked on this instance (x), i.e., with x as the destination. The caller must specify via the 'cap' argument the maximum number of bits (i.e., "bit-capacity") of results that will need to be computed into this Int instance. Once configured in this way, the big.Int implementation's promise to the caller is that for big.Int operations invoked on this instance that support constant-time operation (and not all do or will necessarily be expected to), those operations will be performed such that (a) it will always produce an output result in a buffer sized to hold 'cap' bits regardless of the numeric value of the result, and (b) the arithmetic operation's timing will depend only on the lengths and not the numeric contents of all input operands.

For cryptographic use, it is the responsibility of the caller to ensure that all big.Int instances holding secret keys or other sensitive information, or values computed using such secrets, are configured properly to a fixed size via SetBitCap() before loading sensitive information into them (e.g., via SetBytes or arithmetic operations). When performing an arithmetic operation such as Add, Mul, or Exp on a destination big.Int configured for constant-time operation, the API does not require that all inputs be configured to be the same size, or necessarily to be configured via SetBitCap at all. For example, if crypto code calls z.Add(x,y) in a situation in which operand x is secret/sensitive but operand y is a well-known or otherwise insensitive public value, then operand x should have been likewise configured for constant-time via SetBitCap, but the insensitive operand y need not be. Thus, this API does not create a "separate and incompatible" kind of big.Int for constant-time operation, and allows constant-time and variable-time big.Int instances to be used together as appropriate depending on the sensitivity of particular values in a cryptographic computation. This of course places a burden on the calling crypto code to keep track of which values are sensitive and which aren't, but the developers of crypto code generally face that burden anyway.

If the caller uses x.SetBitCap to configure a particular bit-capacity, but then invokes an operation (e.g., x.Add) that ends up computing a result too large for the configured bit-capacity, then the big.Int arithmetic operation panics if it detects the too-large result. The modified big.Int API does not promise to detect all such too-large results, however, because it internally rounds the requested bit-capacity up to the next machine word. (Note: if it is deemed desirable to enforce bit-capacity exactly, that would probably not be particularly difficult or expensive to do, so it's worth considering as an option. A point for discussion.)

The preliminary, basis-for-discussion patch (https://go-review.googlesource.com/c/45490/) includes a change to the crypto/dsa package illustrating the use of the above API. Note that this is just a temporary "for example" for API discussion purposes. The patch does not even pretend to try to revise all the crypto packages, its example revision of the dsa package itself might still be imperfect/incomplete, and revising each of the crypto packages to support constant-time operations should probably be handled as separate changes once the big.Int API is decided on. The main point is that the code incorporates calls to SetBitCap on values such as 'k' that hold secret keys and sensitive intermediate values, while leaving big.Ints that hold only public values (such as DSA parameters) in the default variable-time configuration. This is intended to give the impression of the extent of changes I expect would be typically required for generic crypto code such as this to support constant-time operation with the proposed API.

Constant-Time Arithmetic Implementation Challenges

Although the API above is hopefully fairly simple and backward-compatible, we of course face several challenges and decisions in delving into the actual implementation details within the math/big package:

  • We don't want the constant-time arithmetic support to slow down non-constant-time arithmetic significantly, e.g., for general-purpose arithmetic that has nothing to do with crypto.

  • We would prefer that the internal code modifications not be too overly invasive, or to result in extensive code duplication (e.g., a constant-time and non-constant-time version of everything).

  • The internal 'nat' type, which implements most of the arithmetic primitives that big.Int is actually built on, currently assumes fairly extensively that all 'nat' values are normalized, meaning the most-significant word in the slice is nonzero. Supporting constant-time operation requires relaxing this assumption at least during constant-time operation, since we need to ensure that 'nat' instances containing sensitive data are sized with respect to public information (i.e., the configured bit-capacity) irrespective of the specific numeric value that 'nat' instance holds (and in particular how many words it minimally requires to represent that numeric value).

  • A second-order challenge is that the internal 'nat' type is not defined as a struct but merely as a type-alias for a Word slice. This is a problem because the most natural way to "propagate" the bit-capacity configuration from big.Int down into big.nat for constant-time arithmetic purposes would be simply to add a configuration field to the nat type - but that's not directly possible given that nat is defined as a word-slice and not a struct. We could change nat into a struct, which would fortunately be an invisible to the external API, but would be a rather invasive change throughout the entire math/big package since so much internal code assumes that nat is a word-slice.

Implementation Approach

There are many likely-reasonable approaches to address these challenges, but here are the decisions and major changes reflected in the tentative, for-discussion patch (https://go-review.googlesource.com/c/45490/).

Normalized versus configured-length nat slices

To avoid duplicating much of the current 'nat' code into a separate-but-incompatible internal type for constant-time arithmetic, I chose simply to revise the 'nat' code and the code that uses it to avoid assuming that a 'nat' is always normalized: i.e., to make nat operations be tolerant of non-normalized inputs. In most cases this proves to be not particularly difficult. For example, nat.sub() must be changed no longer to assume that it's always an underflow in the case (m < n) where the first nat is a shorter word-slice than the second, because the second might now be an un-normalized word-slice with a value smaller than the value in the first. The modified 'nat' library still produces normalized values in the case of default, non-constant-time operation, but merely no longer requires or expects this always to be the case.

Since constant-time operations need to produce nat slices of a fixed size potentially larger than their normalized minimum size, we need to parameterize those operations somehow, but as discussed above we cannot easily just add a configuration field to 'nat' because nat isn't a struct and it would be an extremely invasive change to make it one. Thus, the solution I adopted is to add a 'zcap' parameter to 'nat' operations that (now) support constant-time operation. The caller can pass 0 for the zcap parameter to indicate traditional, variable-time operation.

However, since even changing all calls to 'nat' operations (nat.add, nat.mul, etc) throughout the library to add a '0' parameter to all variable-time invocations of these operations would likewise be a fairly invasive change, I (tentatively) chose to minimize this invasiveness, at the expense of a little more code in nat.go, by creating new zcap-parameterized methods named with a 'c' prefix (nat.cadd, nat.csub, etc), and including corresponding backward-compatibility methods with the original unmodified signatures that simply call the new 'c'-prefixed methods with 0 for the zcap parameter. I make no suggestion that this is the ideal solution for the long term; it was intended only to keep the invasiveness of the changes relatively minor for now, and I am completely open in terms of the "right" way to parameterize nat with a configurable bit-capacity.

Constructing and normalizing nat instances

The main, global effect of the new 'zcap' parameter to the constant-time calls is to parameterize nat.cmake and nat.cnorm, the new versions of nat.make and nat.norm to support constant-time operation. In particular, nat.cmake now ensures that the nat byte-slice it allocates is big enough for either the caller's indicated expected maximum result size (n) or for the configured zcap, whichever is larger.

Most importantly, nat.cnorm now normalizes a slice down to the minimum representation size if zcap == 0, but when zcap > 0 it "normalizes" it down to exactly the size corresponding to zcap. Thus, support for constant-time operation effectively just changes the definition of "normalization" slightly: it means "normalize to minimum size" when zcap == 0 and "normalize to exactly zcap size" when zcap > 0. In many parts of the nat code, simply passing along the zcap parameter down to cmake before an operation and cnorm at the end is "mostly" enough to address the buffer-management challenges that constant-time operation presents.

Conditionally preserving variable-time optimizations

Throughout the implementations of various arithmetic operations mostly in nat.go, there are currently optimizations here that are inherently non-constant-time. For example, basicMul avoids calling addMulVVW at all for multiplicand words that happen to be zero. Similarly, mul normalizes intermediate results in several places during its calculations. To avoid hurting the performance of big-integer arithmetic for general non-cryptographic purposes, I didn't want to remove these optimizations entirely, so instead I made those optimizations conditional on a test for 'zcap > 0'. Thus, the modified big.Int code should not perform noticeably worse under default variable-time operation at least due to lost optimization opportunities.

Note that for strict enforcement of constant-time operation, these tests sometimes assume that the Go compiler preserves natural left-to-right evaluation order for the && and || operators, and I'm not familiar enough with the Go compiler to know yet whether that may be safely relied on. In the example of basicMul, we call addMulVVW 'if zcap > 0 || d != 0', which will be properly constant-time if the Go compiler either (a) evaluates this expression to a single 'bool' value and then performs only a single conditional branch on the result of the full expression, or (b) uses two conditional branch, first branching on the result of 'zcap > 0' and then branching on the result of 'd != 0' only if zcap == 0. If the Go compiler were to get overly smart and decide it wants to evaluate and branch on 'd != 0' first, however, then it would technically break constant-time operation (though in this case this would probably leave an extremely small and hard-to-exploit timing leak). However, this general risk of an overly-smart compiler interfering with developers' attempts to express constant-time code is common and not remotely unique to Go.

Conditional calculations in constant time

Besides disabling variable-time optimizations when zcap > 0, a standard challenge in implementing constant-time arithmetic is dealing with conditionals that depend on computed data. For example, in each iteration nat.montgomery must test a data-dependent carry flag and perform a subtraction or not depending on its value. To address these situations, in the case of constant-time operation (zcap > 0) the code adopts the standard solution of computing both possible result values and performing a constant-time conditional copy or "select" of one or the other. The new nat.sel method provides such a constant-time select for internal use. Since computing two results and selecting just one of them can negatively impact performance, the code does this only in the zcap > 0 case, and retains the existing conditional behavior when zcap == 0.

Effects on Architecture-specific Arithmetic Code

In most cases, it appears that supporting constant-time operation in the math/big package should require no modifications to the optimized arithmetic code in architecture-specific assembly, because that code tends to be already designed to use architecture-specific mechanisms for handling carries and such without requiring conditionals.

Some of the generic versions of the arithmetic primitives - e.g., addWW_g - had variable-time implementations due to conditional carry handling, but I fixed those to use the same constant-time carry/overflow handling logic that the corresponding generic vector primitives (e.g., addVV) were already using.

I encountered only one "missing" arithmetic primitive needed for constant-time arithmetic, namely constant-time comparison. For this purpose I added a 'cmpVV_g', which essentially performs the same operation as a 'subVV_g' except (a) it discards the subtraction's normal numeric output, and hence does not need a target vector; and (b) in addition to computing the carry from the subtraction, it also produces an indicator of whether the result is equal to zero or not.

The current patch only uses this generic subVV_g implementation and does not add a corresponding architecture-specific subVV implementation for any architecture. Doing so should be quite easy and should be a relatively straightforward variation to the subVV code for any given architecture, but this did not seem like a high priority at the moment.

Limitations and Caveats

Once again, the current patch is intended to be a starting point for discussion. It's definitely not yet a perfect or complete solution to the problem of supporting constant-time operation for cryptographic use. In particular, the current patch has at least the following known limitations/caveats (and probably others that I've missed or forgotten to mention):

  • It supports constant-time arithmetic only for certain big.Int operations, currently listed in the godoc for Int.SetBitCap. (I considered adding godoc notes to each of the relevant operations individually about whether or not they currently support constant-time operation, but that seemed unnecessarily invasive at least for now.)

  • The code is currently documented as guaranteeing constant-time operation only when the target's zcap > 0 and all operands and the result are non-negative integers. This minimizes the invasiveness of changes required to all the sign-handling code throughout int.go, and is fairly compatible with the needs of most cryptographic code, which tends to use modular arithmetic on non-negative integers anyway. Most of the sign-handling logic in int.go could in principle be made constant-time if deemed necessary, but sometimes at the performance cost of computing two results and selecting one of them (e.g., the equal-sign and opposite-sign cases in Add and Sub), and given the typical needs of cryptographic computations it's not clear that constant-time sign-handling would be worth either the development effort or performance costs.

  • Although the patch adds constant-time arithmetic support for most of the Int and nat arithmetic operations important for crypto (Add, Sub, Sul, Exp), the remaining "crypto-critical" operation that has not yet been converted this way is Div, needed most importantly for its Mod (modular reduction) functionality. This should not be that hard to achieve for the case when only the dividend may contain sensitive data and not the divisor/modulus, and I believe this is the most common situation in most crypto code (where the dividend typically contains secrets but the divisor/modulus is just a public parameter such as the field size or group order). Making Div constant-time in both its operands (dividend and divisor) may be significantly more tricky, since Knuth's division algorithm strongly depends on the divisor being normalized such that the most-significant-bit is 1. But for most cryptographic purposes it seems likely acceptable for Div to be constant-time only in the dividend and not the divisor (provided of course it's clearly documented as such).

  • Another operation currently missing constant-time support is Int.Bytes(). This case presents no significant technical challenge, but rather the API question as to how exactly it should behave when the Int is configured for zcap > 0. I'm inclined to think it should always return a byte-slice exactly large enough to hold the number of bits set by SetBitCap(). But implementing this behavior would mean that SetBitCap can no longer round up to the next even number of words, but must store the bit-capacity in terms of bytes or bits, with all calls down into nat functions doing the conversion.

  • While all of the standard tests and benchmarks work in the case of variable-time operation, and I have added some tests specifically for constant-time operation (e.g., tests for arithmetic on non-normalized nat inputs), there are not yet extensive tests or benchmarks specifically for the constant-time case.

  • In the longer term, a related wishlist item would be to have a "modular integer" type, e.g., ModInt, which gets configured for modular arithmetic using a fixed modulus and automatically provides constant-time arithmetic support based on the size of that modulus. Such a facility would both make common modular-arithmetic crypto code simpler (by the caller not needing to call Mod after every operation), and would facilitate more optimized but still general-purpose arithmetic relying on caching information about a fixed modulus, such as Barrett-style modular reduction. The DEDIS advanced crypto library already implements a "modular integer" type of this form (see nist.Int at https://godoc.org/github.com/dedis/kyber/nist), but it is currently non-constant-time because it depends on big.Int, which is non-constant-time. But figuring out whether and what kinds of extensions would be appropriate specifically to support modular arithmetic is probably best left for a separate issue and discussion.

Apologies for the lengthy writeup. At any rate, I hope that this discussion and the corresponding preliminary patch illustrate that although not by any means trivial, it appears to be feasible to make math/big support constant-time operation for cryptographic use with (a) minimal public API change, (b) no significant performance penalty for non-cryptographic variable-time operation, and (c) not too invasive changes even internally within math/big outside of the nat.go and int.go files. Further, even with the patch's current known limitations, I would suggest that it already makes math/big a lot safer for cryptographic uses than it currently is, and hence represents a step forward in security.

Comments/discussion welcome. Thanks.

@ALTree ALTree changed the title math/big: support for constant-time arithmetic proposal: math/big: support for constant-time arithmetic Jun 13, 2017
@gopherbot gopherbot added this to the Proposal milestone Jun 13, 2017
@bradfitz
Copy link
Contributor

/cc @griesemer

@griesemer
Copy link
Contributor

griesemer commented Jun 13, 2017

I'm not a security/crypto expert and and I don't have a good understanding of the viability of timing-based side-channel attacks specifically when using Go's crypto routines based on math/lib. My prior understanding was that there's still bigger fish to fry elsewhere, but I will need to leave that judgement to @agl.

That said, adding an extra mode to math/big for all core operations is going to hugely increase the chances for errors because it's pervasive and affects which optimizations can be done and which can't. Any current and future local optimization may cause timing differences depending on operand values that will have to be considered. The patch mentioned in the proposal is pretty pervasive, supporting my point. There may be stronger effects on time such as memory layout of operands, etc. complicating matters and hiding/obfuscating timing differences. In short, I am leaning against such changes to math/lib.

If the goal is to provide bignum operations whose execution times depend only on the length of the operands, we should write a new/separate library specifically built from the ground up with that goal in mind. It's going to be cleaner, simpler, and easier to maintain. Since it's written from the ground up every local operation can be tested from the start with timing considerations in mind. It will only have to contain the core operations needed for crypto ops. It can have an API optimized for that purpose. If there's large amounts of unconditional duplication of code we can still factor that out. For instance, I suspect that many of the low-level assembly cores can be shared unchanged. Much of the testing can be reused or perhaps even factored and shared (the results must be the same). It's those things where much of the implementation time goes.

Thus, alternatively, I suggest instead that we come up with a crypto-specific library containing a minimal bare-bone set of operations. For anything not timing-critical, we convert operands to math/big values and use those operations instead (e.g. formatting and printing, which is part of the code with a lot of complexity).

@bford
Copy link
Contributor Author

bford commented Jun 14, 2017

Putting constant-time big-arithmetic support in a separate package or library designed specifically for crypto is certainly a reasonable alternative to consider, as I mentioned in my writeup. However, to expand a bit on the reasons I find that approach less appealing:

  • Such a separate package/library - let's call it newbig.Int for the sake of argument - could benefit from and would really like to make use of all the architecture-optimized arithmetic primitives in math/big, which currently aren't exposed publicly. If it's decided that the separate-package newbig.Int alternative is preferred, then would you be open to a change that exposes those optimized arithmetic primitives (perhaps as a sub-package, e.g., big/math/arith, to avoid adding low-level clutter to the big/math API itself)? That might be a worthwhile change anyway, which I would support - but keep in mind that that would actually represent a much bigger expansion of the public Go API than the change proposed above.

  • Such a specifically-for-crypto newbig.Int package would probably not be "constant-time-only" but might still want to support variable-time arithmetic as well, because real crypto code tends to include a mixture of secret/sensitive and public, non-sensitive calculations, and optimization practices often include using faster, variable-time arithmetic for the latter category. A well-known example is DJB's now-ubiquitous hand-optimized implementation of the Ed25519 signing scheme, which @agl ported to Go: it includes two separate implementations of scalar multiplication, one constant-time scheme for signing (which needs the sensitive private key), the other an as-fast-as-possible variable-time scheme for signature verification (not sensitive because it needs only the signature and public key). Thus, the upshot would be that newbig.Int would substantially overlap with and probably look a lot like big.Int (or perhaps more like the internal big.nat), likely starting with a fork of the big.Int code itself with changes that look something like those I proposed. Is the specialization for crypto worth the longer-term maintenance burden of maintaining similar-but-different big.Int and newbig.Int code?

  • If you want the crypto in the Go standard library ever to have a safe constant-time implementation, then newbig.Int, whatever it looks like and however it's implemented, will also have to be part of the Go standard library so that the standard crypto packages can use it. This would seem to make a forked-and-modified separate package just for constant-time arithmetic seem still more undesirable from a Go runtime support size and maintainability perspective. (Of course newbig.Int could be designed to share common underlying code with big.Int - e.g., the optimized arithmetic arithmetic primitives - but there would probably still be a lot of duplication of the code in nat.go if that's forked-and-modified rather than being reused as proposed in my change.)

  • Many crypto packages in the Go standard library are already tied to big.Int through their now-frozen type definitions. For example, see Parameters and Public/PrivateKey in crypto/dsa; PrivateKey, r, s in crypto/ecdsa; everything in crypto/elliptic; PrivateKey in crypto/rsa. To make Go's implementations of these crypto primitives safe, the standard library would have to keep using big.Int at API level but convert them to/from newbig.Int behind the scenes before/after doing arithmetic. This is undesirable for at least three reasons I can think of: (a) it's cumbersome just from an implementation perspective; (b) it means Go's standard crypto APIs will still be implicitly encouraging users to use big.Int rather than newbig.Int for (some) crypto purposes; and (c) it may still be unsafe from a constant-time security perspective. In principle, just loading a secret key into a variable-time big.Int just briefly to call a legacy Go crypto API before that crypto code's implementation further converts it to a constant-time newbig.Int still leaves a potential timing-channel leak due to the variable-time that conversion to/from big.Int will take depending on the numeric value of the private key, even if all the arithmetic itself is constant-time. Granted, I expect this residual timing-channel leak due to big.Int/newbig.Int conversion to be pretty hard to exploit - but we've been surprised a few times before when no-way-ever-exploitable timing channels were successfully exploited.

@nikitaborisov
Copy link

nikitaborisov commented Jun 14, 2017

I'm not sure about the question of separate library or not, but I'd advocate for at least a separate type ConstInt, all operations involving which must be constant time. The type would make future APIs much cleaner since they would be explicit about which numbers require constant time and which do not. It also creates minimal interference for existing code, since it doesn't change.

As far as creating a maintenance headache from a fork, I think this is unavoidable, because any changes to non-constant-time big.Int implementations must be evaluated from a constant-time perspective before being merged in.

@nikitaborisov
Copy link

I'm thinking more about the legacy code problem, and I think the 3 problems you list exist in your proposal, too:

  • (a) The implementation of, say, crypto/rsa would need to convert a non-constant-time big.Int into a constant-time big.Int by calling SetBitCap()
  • (b) Without placing a requirement on big.Int parameters to be constant-time, you are still implicitly encouraging crypto users to use big.Int
  • (c) It may be unsafe since a conversion still exists

The only alternative I see is changing the code implementing existing crypto APIs to return an error if they are called with a secret parameter that does not have a bit capacity set. But at this point you're effectively changing the API anyway (old code breaks), and you might as well enforce the change through the type system rather than with a runtime error.

@bford
Copy link
Contributor Author

bford commented Jun 14, 2017

Thanks Nikita for weighing in with your thoughtful comments.

I don't have any strong objections to introducing a new type such as ConstInt, but just want to point out the following downsides:

  • This would inevitably expand the API a lot more than just adding one new property to big.Int (a whole new type with a bunch of methods, not just adding a single property to an existing type).

  • It would make it rather awkward to mix ConstInts with variable-time big.Ints when appropriate. For example, the standard DJB-optimized implementation of Ed25519 signing uses constant-time scalar multiplication only for signature generation (since this requires the use of a sensitive private key), and uses faster variable-time scalar multiplication for signature verification (since this uses only non-sensitive public values). With the SetBitCap() proposal, it's trivial for well-designed crypto code to preserve this practice, e.g., by using SetBitCap() only in the signing code involving secret values but leaving it out of the signature-verification code so that verification can use all the variable-time performance optimizations. Separate types, though conceptually appealing would mean that whenever I want to do an arithmetic involving both "sensitive, constant-time" and "non-sensitive, variable-time-allowed" values. big.Int.Add(x,y) will take only big.Ints as x,y and ConstInt.Add(x,y) will accept only ConstInts as x,y, even though they're semantically the same thing. For example, should a public key (in itself generally a non-sensitive value) be stored in a ConstInt or a big.Int? If big.Int, then signature verification can be done conveniently using big.Int arithmetic, but when generating it I have to use a ConstInt first for the scalable multiplication with my private key (which definitely needs to be a ConstInt obviously) and then convert it to a big.Int. If public keys are ConstInts, on the other hand, then either have to use ConstInts for everything and give up variable-time performance optimizations (and convenience) of simple big.Ints for non-sensitive operations like signature verification, or I have to first convert the ConstInt public key to a big.Int public key before signature verification. All doable, of course, but a bit of a pain and just seems unnecessary.

  • Several of the existing Go crypto APIs (e.g., crypto/dsa, crypto/ecdsa, ecrypto/rsa) would still be left with the problem that private keys are already frozen as being declared of type big.Int and not ConstInt as they would need to be.

Regarding the legacy code problem, responding to your points (a)-(c):

(a) No, the implementation of crypto/rsa would not need to (and should not) call SetBitCap() on big.Ints passed in to it (e.g., parameters and public keys), only on the big.Ints it uses internally as destinations for big.Int arithmetic it performs itself. Remember SetBitCap() configures the size of a result and hence is called only on a destination big.Int. A function that receives a big.Int parameter should not call SetBitCap() on any of its big.Int parameters because that would amount to modifying its inputs, which is against API conventions. A function like RSA encryption/decryption is only responsible for (and allowed to) call SetBitCap() to configure the sizes of the result big.Ints it computes, either as intermediate results or to pass back to the caller as the ultimate answer. Most importantly, since the crypto/rsa, crypto/dsa, etc implementations can insert the appropriate SetBitCap calls internally to the big.Ints they generate as part of their operation, none of the public type signatures need to change in order to support constant-time implementations.

(b) The documentation for the calling conventions to APIs like crypto/rsa should of course be updated to recommend (ideally require) that the caller who is providing sensitive inputs such as RSA private keys properly configure the big.Ints representing those secrets via SetBitCap() before loading them with secrets or passing them to the crypto/rsa module. Thus, the constant-time conversion will not be fully complete or water-tight until callers of APIs like crypt/rsa heed this advice and make sure the big.Int inputs they pass to the crypto APIs are properly configured via SetBitCap(). However, (a) many of these big.Ints will in fact have been generated and provided to the caller by some other method(s) from the same crypto API, e.g., the RSA key generator or the standard ASN1 key unmarshaling code, which can be updated to produce constant-time big.Ints silently without the application/client code actually needing to be updated at all, and (b) even in the cases where the calling code is doing its own generation (unmarshaling etc) of those big.Ints used as inputs to the crypto API, quick non-constant-time operations like marshaling and unmarshaling tend to be much harder to exploit than the compute-intensive actual "core arithmetic" operations like exponentiation for RSA or scalar multiplication for ECC algorithms, so we gain a lot of security by constant-time-hardening the implementations of those algorithms even if that doesn't automatically mean complete constant-time-hardening of all legacy applications using those APIs.

(c) In the SetBitCap proposal, no "conversion" is required of big.Int inputs to crypto APIs such as crypto/rsa, even internally, whether or not the calling application has properly configured constant-time input big.Ints by calling SetBitCap(). The only difference this creates is (a) what size Word[]-slice internally represents those input big.Ints, and (b) whether those Word[] slices are normalized (most-significant-word always nonzero) or non-normalized (most-significant-word may be zero). The current big.Int (and internal big.nat) code already, unavoidably, handles the "hard part" of handling all input combinations in the general case when one of the operands is longer than the other (adding a long big.Int to a short one or vice versa, etc); if you delve into the big.nat code you'll see that quite a bit of logic is dedicated to that. That complexity is all required regardless and is already present. The only new complexity is the need for the internal big.nat code to be able to handle not just mixed-length but also non-normalized Word[]-slices as parameters. There were of course some normalization assumptions in the code, but those are not that numerous and are pretty easy to address, as my proposal already discussed. Dive in and look at the code if this is unclear. But at any rate, the point is that no conversion (type conversion or format conversion or anything) is required in general, even internally, to ensure that any big.Int target can accept any combination of "constant-time" or "variable-time" big.Ints as inputs.

@bford
Copy link
Contributor Author

bford commented Jun 15, 2017

Apologies if the initial proposal was a bit overly long and impenetrable. Based on the first few reactions I've seen, perhaps it's best if I back up and ask those interested (and still paying attention) to express their sentiments on a few high-level questions of priorities, while trying to pull apart a few related issues. Either simple yes/no responses, or more elaborate responses, would be fine.

  1. Is it a priority to fix the public-key crypto in Go's standard library to support constant-time operation? In other words, is it worthwhile at all to address the potential timing-channel risks in this public-key crypto and the software that depends on it (e.g., Go's built-in TLS and the applications depending on it)?

  2. If the answer to question cmd/cgo: fails with gcc 4.4.1 #1 is yes, then is it a priority to address constant-time operation in a way that is compatible with the currently-frozen API signatures of the relevant Go public-key crypto packages that rely on the 'big.Int' type (e.g., crypto/rsa, crypto/dsa, crypto/ecdsa, crypto/elliptic)?

  3. If the answer to question net: LookupHost is returning odd values and crashing net tests #2 is no, then should those public-key crypto packages in the Go standard library be deprecated, i.e., marked "do not use", on the basis that their public-key arithmetic cannot and will not be made cryptographically constant-time while dependent on big.Int?

Thanks.

@griesemer
Copy link
Contributor

I've now just reread the entire proposal and all comments. First, let me answer a couple of your earlier questions:

Regarding sharing of the underlying assembly routines: If the decision were to create a separate package for constant time operations, it would be fairly easy to share the core routines w/o exposing more in the existing APIs by creating an internal "arith" (or similar) package under the math directory. Such a package, while having a public interface, would not be accessible outside the std library. We do this already in various places in the std library.

Regarding your concern about the compiler's conditional evaluation logic (you bring the example of the modified basicMul): The compiler must preserve semantics and thus most likely will evaluate in sequence. There may be optimizations but they will have such a small impact on exact runtime that I doubt they can be reliably measured unless they happen in an innermost loop (where we would want to avoid an extra condition in the first place for general performance reasons). Unless you're on a completely and reliably "quiet" system doing nothing else but the crypto computations, runtimes will fluctuate by several percent just due to cache misses, memory use, etc.

I can definitively see the attractiveness of being able to use the existing math/big library largely unchanged and just have time-sensitive clients set the appropriate flags. I am worried that this "dynamic" approach is extremely sensitive to user and implementation error. I would much rather prefer this were somehow statically enforced. As such I do have sentiment for @nikitaborisov suggestion of a constant-time integer type. Adding that to package big might be an easier solution than writing a new library entirely (as I suggested in my first comment). It also may make interoperation between regular and constant-time integers easier. But it still leaves the issue of having to change all the clients in perhaps backwards-incompatible ways.

But going to your big 3 questions at the end:

I cannot answer your first question. As I mentioned before, in the past @agl thought there were bigger fish to fry before tackling this. I think crypto experts will have to weigh in here with a pretty concrete risk assessment: Would it be easily possible to stage an attack in a real-wold scenario with a high chance of success by exploiting the time side-channel using Go's crypto code? (Or are there other timing issues that completely dominate fluctuations in the crypto performance?).

If the answer to that is yes, then your question 1 would probably have to be answered as yes given the increasing popularity of Go.

I think we need to get an answer here first before we dive deeper into the problem.

@rsc
Copy link
Contributor

rsc commented Jul 6, 2017

Hi Bryan! Thanks very much for this proposal. Personally, I think it's important to make sure that Go can support arbitrary crypto code in this way, but I don't know that it's important. If someone were to challenge us on that point, are there examples of interesting or important projects where people are using Go that depend on having constant-time arithmetic? I think that kind of evidence of significance would help tilt the scales a bit here.

I also wanted to write to let you know that due to travel and vacations and the like, essentially all the relevant people on the Go team will be away for all of July and likely the beginning of August. I intend to try to move the process along starting in mid-August or so; if you don't hear from us before then, we're not ignoring you. Thanks again.

@bford
Copy link
Contributor Author

bford commented Jul 6, 2017

Thanks Robert and Russ for looking at the code and proposal.

Robert: on the dynamic vs static selection of constant-time operation, I agree that in principle static type-based enforcement would be preferable. But note that the nature of the constant-time arithmetic requirement is not just a boolean on/off decision: in the "on" case, it's necessary to configure the result size, which necessary varies with cryptosystem and often with particular parts of a cryptographic operation. For example, discrete-log and elliptic-curve cryptosystems typically have both field arithmetic and scalar/exponent arithmetic, which often use vastly different-size moduli, so constant-time operations for one need to be configured somehow with rather different-size results buffers than constant-time operations for the other. Sometimes values cross the two domains, e.g., the way the elliptic-curve x coordinate in ECDSA gets injected into the scalar arithmetic, or the way in Paillier cryptosystems some operations are done mod n and others mod n^2. To get this "right" in a fully statically type-checked way, Go would need parameterized types, at least types that can be parameterized by integer constants (representing results sizes). And even that would probably be cumbersome unless the type system had pretty sophisticated facilities to allow statically type-checked conversions when arithmetic needs to cross different sizes and moduli in a statically type-safe way. Certainly Go currently doesn't have anywhere near the type system sophistication for that, and even if generics are in the works, this kind of heavy dependence on sophisticated typing doesn't really seem like Go's style.

The next-best option, which seems like a more reasonable tradeoff and more aligned with Go's style, would be something like the "modular integer" type in our Kyber crypto library:

http://godoc.org/github.com/dedis/kyber/nist

This basically represents a type similar to big.Int in operation (and our implementation uses big.Int underneath) but parameterized with a constant modulus, which automatically gets applied to all operations performed via this type. Given that a lot of (though not all) big-num crypto arithmetic tends to be done modulo something or another, and the modulo parameter can be taken as implicitly defining the result size, that makes it potentially natural simply to specify that this particular "ModInt" type (pick your favorite name) always operates in constant time. Then the difference between "big.Int" and "big.ModInt" can be more-or-less taken as the statically-enforced boundary between variable-time and constant-time arithmetic. I think it would be extremely reasonable and likely desirable to add such a "modular-int" type to the Go library at some point, and I had intended that to be an eventual "followup step" to the above proposal.

But it still wouldn't solve the complete problem, since as I mentioned much but not all bignum crypto code is done modulo something or other, and there would still be the problem of the legacy crypto interfaces that are already hardwired to use big.Int rather than big.ModInt or whatever the new type might be called. So the "complete" solution I would prefer to see eventually would be to have the new big.ModInt type, which most (new) crypto code could use for its arithmetic performed modulo something or another and would be constant-time by default - but also to have a way to configure big.Int for constant-time operation as proposed above, both to support legacy uses of big.Int and to provide a "lower-level interface" to the constant-time arithmetic when needed. The new big.ModInt type could then be built quite simply and easily atop the proposed big.Int with configurable constant-time support.

@bford
Copy link
Contributor Author

bford commented Jul 6, 2017

Russ: what exactly would qualify as "examples of interesting or important projects where people are using Go that depend on having constant-time arithmetic"?

For example, all of my group's projects, since I moved to EPFL two years ago, are in Go and depend on crypto that really needs constant-time big-int arithmetic. These projects include:

All of these projects build on, and drive the development of, our Kyber advanced crypto library for Go. Kyber provides a general framework for advanced elliptic-curve crypto including support for general zero-knowledge proofs, verifiable shuffles, [verifiable/joint/publicly-verifiable] Shamir secret sharing in cryptographic contexts, etc. All of this really needs constant-time arithmetic.

We are soon going to release Kyber v1 (currently a branch), which we think is fairly stable, solid, and usable - but it has an important limitation: it will initially support only one elliptic curve, Ed25519, in the default "recommended for real use" configuration. This is because the Ed25519 curve arithmetic implementation has constant-time arithmetic hard-coded and hand-optimized for the Ed25519 curve by DJB and ported to Go by @agl. Our Kyber infrastructure actually supports many other curves already, and we'd like to support more (e.g., the extreme-security Ed448 curve recently standardized in RFC 8032) - but the implementations of the arithmetic for all the other curves depend directly and/or indirectly on Go's big.Int. Therefore, we can't in good conscience recommend that people actually use any of these alternate curves, including the nominally "more secure" ones, while big.Int has no constant-time support, so we decided to leave them only conditionally-compiled and disabled by default in the upcoming Kyber v1 release.

Of course we can - and might - just rewrite Kyber so that it doesn't rely on Go's built-in big.Int at all. But then we'd either have to lose the use of all the nice architecture-specific assembly optimizations that the big.Int and underlying big.nat modules include, or else fork the whole mess and maintain a separate copy of all that, and neither of those choices is particularly appealing.

Backing up, if our own projects don't count as "interesting or important" enough, it shouldn't be hard to find plenty of other examples. For example, Go is increasingly popular in blockchain and distributed ledger projects of many types. For example, there is a Bitcoin implementation in Go - though not sure how functional or popular this in particular is. Besides Bitcoin's standard existing use of the secp256k1 curve (which I think is the one and only curve that Go's standard crypto library has special-cases for constant-time arithmetic), Bitcoin has been considering adopting aggregate Schnorr signatures for transaction signing, which again really needs constant-time bignum arithmetic. Hackers have gotten extremely good at exploiting the smallest side-channel leaks of any kind, especially when the "universal bug bounty" of Bitcoin wallets are the prize. Similarly, one of the standard implementations of Ethereum is in Go, and I understand this implementation is quite popular. I expect there are many other examples; these are just the ones that come to mind immediately.

Note that @nikitaborisov, who commented earlier, is an extremely well-known systems/applied-crypto person (you should know his work if you don't already :) ), and my reading of his comments is that he had no argument with the necessity and importance of making the crypto arithmetic constant-time, but was just (understandably) unsure the best way to go about implementing it. But I'll let him speak for himself if he wants to comment further.

Perhaps @tomrist - "Professor Side-Channel" himself ;) - might also like to weigh in?

@bford
Copy link
Contributor Author

bford commented Jul 6, 2017

P.S. Also, I understand the vacation schedule issues - no big hurry on my end either, as we have one solid curve to work with (Ed25519) for now and we're largely busy with other higher-priority things as well for now. But I would really love to see some kind of progress on this problem within the next few months at least. Thanks again for reading the code and proposal.

@rsc
Copy link
Contributor

rsc commented Jul 6, 2017

@bford, thanks for all the examples. That's fantastic. I'll grab @agl and @griesemer about this in August.

@agl
Copy link
Contributor

agl commented Jul 7, 2017

Constant-time arithmetic has been a long-standing desiderata. At the moment, it's probably the case that an attacker who can position themselves on the same machine as a Go process doing many crypto operations (e.g. a TLS server) can extract secrets.

(I've not done all the work to show that this is possible, but there's a large literature on these sorts of attacks and there's no reason to believe that it wouldn't apply to some part of Go. expNNMontgomery, for example, does not do a constant-time read of the table of powers. Perhaps RSA is saved by blinding, but the exponent is still constant.)

It is a common design in crypto libraries to have bigints that never have the most-significant limb be zero, like math/big. However, that's because they, like Go, were built on a more general-purpose library. The solution in those cases is generally to have a magic modexp operation that is constant-time and operates without the usual library methods. Apart from that, they just hope that the remaining leaks are small enough not to worry about. Maybe that's practically true, but side-channel attacks keep getting better.

Modern crypto primitives have generally taken the approach of abandoning generic bigint libraries for fully-custom code. That's why we have P-256 and X25519 implementations that are constant-time and don't use math/big. The couple of examples of constant-time bigint libraries that I'm aware of are both recent and are BearSSL and a library mentioned in papers by the Everest folks.

If we were to try to extend math/big for constant-time then a few points came to mind when reading the above:

  • math/big has a large API and so there would be a large testing surface to ensure that every function still gave the correct answers when given a "capped" value for each input. I think this is solvable with unittests, but would need some thought.
  • Wherever the code lives, we don't have any way to test whether we have achieved constant-time behaviour, esp over time as code is changed. This is a hard problem in general, although I've been able to use a patched valgrind for this purpose for C and assembly code in the past. Perhaps something similar can work for Go, but we would hopefully have thought about this aspect of maintenance beforehand. (Depending on code-review is viable if we conclude that it's the best that we can do, but a burden.)
  • Is a capped value to be taken as a generic "constant-time" flag? There is more value-dependent logic than just normalisation so what would trigger, say, a modexp that's constant-time in the exponent? A capped exponent value, even though that's an input?

Should this be something that Go takes on, I've no initial strong feelings about extending math/big vs a separate package, but it occurs that converting *big.Ints at package interfaces need not be too onerous since Bits would allow the value of an alternative type to be aliased with a *big.Int.

@bradfitz
Copy link
Contributor

bradfitz commented Jul 7, 2017

@aclements, @randall77, @mdempsky, @josharian, how hard would it be to add a flag to the compiler to increment a counter on every memory load, even if it's slower?

The idea is that we'd run a builder in that mode and the math/constantbig (or whatever name) package would conditionally enable constant-time testing if those counters were available. This would remove the need for a patched valgrind.

@aclements
Copy link
Member

This would remove the need for a patched valgrind.

If I understand correctly, the patched valgrind is doing much more sophisticated things than counting memory loads. But maybe counting memory loads is an okay baseline (@agl?).

@josharian
Copy link
Contributor

Patching the compiler wouldn't handle assembly. I suppose we could patch the assembler. We could instrument branches (basic blocks) as well as memory loads. Difficulty is...medium-hard? There's also deciding how it interfaces with the test framework, which is non-obvious. It'd be good to know what exactly the patched valgrind is doing, as a baseline.

@aclements
Copy link
Member

For counting memory loads and branches, I wonder if we could use performance counters instead? There might be a tiny bit of noise, but assuming the test stresses things in ways that would otherwise have huge disparities, this might be much simpler.

@josharian
Copy link
Contributor

josharian commented Jul 7, 2017

Or we could just extend the coverage tool to instrument assembly as well as Go code. (Note that memory loads/stores happen in basic blocks.) Then write a driver that executes a function with different inputs and reads the coverage results and looks for discrepancies. If we're worried about basic counts missing bugs, we could also sample coverage sequences at fixed intervals and fail if any sequence mismatched.

@bradfitz
Copy link
Contributor

bradfitz commented Jul 7, 2017

@aclements, perf counters definitely seems easier to start with. Unfortunately GCE (where most our builders live) doesn't support perf counters, last I checked. But we could run a dedicated machine or two just for those tests.

@agl
Copy link
Contributor

agl commented Jul 7, 2017

The constant-time property is that, for all inputs, the trace of instructions fetched and memory addresses read, is constant. (In an abstract fetch-execute machine.)

Valgrind basically already checks that for uninitialised memory: branching on uninitialised memory or indexing based on it is an error. Thus we can tell Valgrind that our secret input is "uninitialised" and get it to check constant-timeness for us.

(At the end of the computation you have to "bless" the secret memory as ok to read from, otherwise you could never, say, output a signature that you have computed. That is done at a point in time where there is a sufficiently large "cliff" that it's safe to do so. The attacker, in contrast, wants to climb a gradual hill to extract the secret bit-by-bit.)

I don't know whether perf counters include the noise of context switches, prefetching, branch prediction etc. If so, that would be unfortunate. Also, it wouldn't detect secret-based memory indexes unless some of those branches were triggered by random inputs and the different branch count could be observed.

Perhaps the way to go is to hack up Valgrind until it's happy with Go binaries and then use the trick above.

@josharian
Copy link
Contributor

The constant-time property is that, for all inputs, the trace of instructions fetched and memory addresses read, is constant.

Thanks, that's helpful. The need to record memory addresses read makes extending go tool cover less appealing.

Expanding on the idea of teaching obj (the assembler) to instrument binaries to generate exactly this trace, I guess it'd look very roughly like:

package cttrace

const bigEnough = 10000000

var trace [bigEnough]uintptr
var end unsafe.Pointer

func Reset() {
  zero trace up to end
  end = unsafe.Pointer(&trace[0])
}

func Check() {
  if bigEnough wasn't big enough {
    panic("oops")
  }
}

func Trace() []uintptr {
  return slice of trace, to end pointer
}

For specified packages (or functions?), at every basic block, emit (in assembly): MOVQ IP, cttrace.end; ADDQ $64, cttrace.end. At every memory access, emit the same, but using the address being read/written instead of the instruction pointer. (If we want to distinguish IPs, reads, and writes, change Trace to be a struct and record a kind field.) Test by writing a regular Go driver that does: (1) call cttrace.Reset, (2) call function in package of interest, (3) save cttrace.Trace(), or maybe a hash of it. Compare all traces.

Definitely more complicated than performance counters, though. And maybe more complicated than making Valgrind work with Go binaries.

@bford
Copy link
Contributor Author

bford commented Jul 8, 2017

Thanks @agl for weighing in; all your points are right-on. A couple followup comments:

Modern crypto primitives have generally taken the approach of abandoning generic bigint libraries
for fully-custom code.

Agreed, and for the record I don't have any objections to such fully-custom implementations especially for extremely popular common-case crypto algorithms like DH and EdDSA. We're using those too, as mentioned above. But I think it's pretty sad if fully-custom code is (and remains) the only way to get plausibly-secure implementations of bignum-based crypto algorithms. Getting our crypto-software ecosystem entrenched into pervasive dependencies on fully-custom code for side-channel security creates even higher artificial barriers to use and deployment of new curves and schemes that in principle provide important security or functionality advantages (e.g., Ed448 or other "spinal-tap-grade security" curves; pairing-based curves for all the wonderful gizmos like accumulators and zk-SNARKS that they enable; lattice-based crypto for fully-homomorphic encryption and post-quantum paranoia...). It would be much better all-around if we could use fully-custom code only to optimize the most performance-critical common-case paths, rather than for all crypto that we hope/expect to be reasonably side-channel secure.

math/big has a large API and so there would be a large testing surface to ensure that every
function still gave the correct answers when given a "capped" value for each input. I think this is
solvable with unittests, but would need some thought.

Agreed that that this is an important and non-trivial challenge. My proposed starting-point patch ventures a bit into this direction by adding a few spot-checks of this kind to the unit tests, just to get a feel for how they might be done, but it'll of course take a lot more work and thought to develop really thorough unit tests for the capped-input cases.

Is a capped value to be taken as a generic "constant-time" flag? There is more value-dependent
logic than just normalisation so what would trigger, say, a modexp that's constant-time in the
exponent? A capped exponent value, even though that's an input?

My current patch treats the existence of a capped-value configuration in the result object as the "constant-time" flag. And yes, the proposed patch already uses this flag to deal with a bunch of the other value-dependent logic (besides normalization). See for example karatsubaAdd, where I disable the value-dependent optimization of skipping multiplies by zero words when 'zcap > 0' (the flag). And similarly, in several places in karatsuba, where this flag triggers the compute-both-options-and-select approach to value-dependent conditionals. I certainly don't claim that this is necessarily the best way to handle this, but in my limited experience so far it seems to work reasonably smoothly. The constant-time flag could of course be handled with a state element separate from the result size or cap configuration, making them two orthogonal configuration elements, but that would seem to increase complexity (and testing surface) further, and I've tried and failed to think of any good reason one would want the "capped" result size but not constant-time operation or vice versa.

Wherever the code lives, we don't have any way to test whether we have achieved
constant-time behaviour, esp over time as code is changed.

Agreed, this is an important problem. Just for the record, as far as I know this "untestability of constant-time behavior" is unfortunately the current state-of-the-art across all languages I'm familiar with, especially in C, where compilers frequently get so overly smart that they have regularly been found to silently "optimize" fully-custom constant-time crypto source code into badly non-constant-time assembly output. Having support in big.Int for what "human eyeball review" says should produce constant-time behavior would be a big step forward for Go, even if that behavior is not yet thoroughly testable all the way to assembly/binary level. In other words, I'd suggest we prioritize fixing the huge, obvious, gaping crevasses between Go's current behavior and nominally source-level-constant-time practices before we worry about (or get blocked on) the further problem of testing, finding, and fixing the likely hairline fractures between what we (humans) think will produce constant-time behavior and what will actually produce constant-time behavior given all allowed compiler optimizations and such.

At the same time, I will be hugely ecstatic and supportive if Go jumps into this seriously enough to incorporate "end-to-end" constant-time behavior testability, using a Valgrind-like or other analysis. That would put Go way ahead of the standard state-of-the-art of support for constant-time crypto in other languages as far as I know (although I'm not an expert on the very latest in compiler/language support for constant-time). Toward this end, several intermediate points might make sense as well. A Valgrind-like analysis of entire traces to ensure constant-time behavior with respect to both instructions and memory-access indexes, as @agl suggests, would certainly be ideal if/when realistically achievable. But even weaker, perhaps easier-to-implement and/or more lightweight sanity-check testing measures might be quite useful as well, such as a simple "number of instructions executed" comparison metric. There are performance counters that can measure exact number of instructions (and/or branches) executed along a code path, for example (I used them in my Determinator/DeterLand work). This wouldn't catch all accidental non-constant time behavior, especially memory-index dependencies, but intuitively I think it would be able to catch a high percentage of the accidental non-constant-time behavior that might slip into math/big in particular. Almost all of the "constant-time hazards" I identified so far are conditional/branch or value-size related and thus would be likely to be reasonably effectively sanity-checkable via simple instruction count (without implying that this would be an exhaustive test).

So in other words there might be a testing complexity/efficiency versus exhaustiveness balance to be considered here, and the two approaches need not be mutually exclusive. If the full-scale Valgrind analysis is implemented but proves to be too costly to run by default every time anyone runs 'go test ./...' on math/big, but a simple instruction-count or other lightweight sanity-check can be used all the time in that way, then both might be useful in their places. But I'm not the expert on these analysis techniques so I don't take any strong position one way or another.

@AnomalRoil
Copy link

I recently played around with Dudect (from the paper "Dude is my code constant time") and ended up reimplementing it in Go. While this might be considered as a "lesser", or "easier", way to go around constant time checks, since it relies on statistical tests, it actually allows to easily test timing discrepancies in (Go) code thanks to Welch's t-test and seems effective in practice.
My goal when I did it was to see to what extent the DecryptOAEP function from Go's rsa package was leaking the number of leading zeros mentioned in Go's code.
My conclusion was that while the leftPad's function clearly leaks, the whole DecryptOAEP function was suffering too much from the background noise for the leak to be of any use.

So, my point here is that while having a constant time big.Int library in Go would be awesome (and I totally support the idea), it would also probably require a lot of work on the existing Go's crypto code in order to avoid altogether all leaks (ie. it won't suffice per se.)

Regarding the latest topic here, I would also be thrilled to see support of perf counter for constant time code execution at the compiler level, allowing for unit tests to actually catch timing discrepancies, this would be awesome.
Note that statistical tests of timing discrepancies should probably not be considered as a viable option, since they usually requires thousands of traces to be accurate and as such would clog the tests.

@rsc
Copy link
Contributor

rsc commented Aug 14, 2017

One option I don't see above (but maybe I missed it) is not splitting big.Int or math/big but splitting the API methods so that there would be ConstantTimeAdd, etc. Would that be clearer than reusing cap? Would it be enough? Would it not look awful?

@rsc
Copy link
Contributor

rsc commented Oct 2, 2017

It's probably getting a bit late for Go 1.10, but it would be nice to make progress on trying to figure out what the plan is. Any replies to my Aug 14 comment?

/cc @bford @nikitaborisov

@robaho
Copy link

robaho commented Dec 3, 2018

To perform time quantization should be fairly straight forward. Since the whole point of constant time math is a known number of memory accesses and math operations, knowing the algorithm, you should be able to calculate these for a certain bit size. Just run the warmup loop at startup calculating an upper bound and use that in the quantization.

Might be even easier to perform random samples at the start.

The papers just suggest using a rolling average window for the bounds.

@conradoplg
Copy link
Contributor

That's not how cache attacks work. An attacker, in another process, can measure the effect the attacked algorithm had in the CPU cache.

https://www.cs.tau.ac.il/~tromer/papers/cache-joc-20090619.pdf

Another approach is to normalize all timings to a fixed value, by adding appropriate delays to the encryption, but beside the practical difficulties in implementing this, it means all encryptions have to be as slow as the worst-case timing (achieved here when all memory accesses miss the cache). Neither of these provide protection against the Prime+Probe synchronous attack or the asynchronous attack.

https://eprint.iacr.org/2013/448.pdf

Like FLUSH+RELOAD, these side-channel attacks rely on data flow from secret exponent bits to memory access patterns. These attacks can be prevented by using constant time exponentiation, where the sequence of instructions and memory locations accessed are fixed and do not depend on the value of the exponent bits.

https://www.cs.unc.edu/~reiter/papers/2012/CCS.pdf

There exists a long line of work on cryptographic algorithms designed to be
side-channel resistant (e.g., [11, 33, 35, 35, 36]). Recent versions of some cryptographic libraries attempt to prevent the most egregious side-channels; e.g., one can use the Montgomery ladder algorithm [30] for exponentiation or even a branchless algorithm.

No, it's not straight forward at all to perform time quantization unless you're running on real-time operating system. Code runs in shared environments. There could be tons of processes running; some process could be using the entire RAM and the crypto code need to read from swap and could take one minute to run. Should we wait one minute in all crypto operations?

@rsc
Copy link
Contributor

rsc commented Dec 3, 2018

@robaho I appreciate your zeal in arguing that we should do nothing, but it seems premature. This proposal has not yet worked out what the best something we could do is. Once there's a more concrete something, it would be fine to argue that it's worse than doing nothing. For better or worse this discussion died down, and it hasn't picked up again. Your arguments are worth considering against any possible future "something", of course, and it's good to have them recorded, but you probably don't need to keep making them at this point. They've been recorded. We're not going to unilaterally close the issue at this point.

@robaho
Copy link

robaho commented Dec 3, 2018

I wasn’t arguing to do nothing. I was arguing that what I read shows that there are alternative approaches to solve the underlying problem, that might be easier to implement and verify, especially in an environment like Go.

I am still reading the recently cited papers. Thank you for those.

@robaho
Copy link

robaho commented Dec 3, 2018

@conradoplg thanks for the papers. The prime+probe attack does indeed get around time quantization, but it’s somewhat doubtful on a modern multi processor multi core system this could be pulled off, but that doesn’t matter. Call it a given that it can.

The papers raise three concerns with this proposal.:

  1. These attacks are predicated on running the attacker along side the victim, which is probably pretty rare this days, at least in a modern platform environment, and even if so, it would be in a vm and probably cpu isolated for performance.

  2. None of the suggested counter measures use fixed time math. They are all higher level changes.

  3. The attacks and counter measures rely on methods not available in Go.

So, the time quantization is still an effective counter measure to external attacks.

I am still reading the other papers. Thank you.

@robaho
Copy link

robaho commented Dec 3, 2018

@conradoplg the second paper states,

“Addressing this weakness requires a hardware fix, which, unless implemented as a microcode update, will not be applicable to existing hardware.
Preventing page sharing also blocks the FLUSH+RE- LOAD technique. Given the strength of the attack, we believe that the memory saved by sharing pages in a vir- tualised environment does not justify the breach in the isolation between guests. We, therefore, recommend that memory de-duplication be switched off.”

So again, this proposal will not prevent that attack.

@robaho
Copy link

robaho commented Dec 3, 2018

The third paper starts with “Our investigations presume that in some manner an attacker has achieved control of a VM co-resident on the same physical computer as the victim VM, such as by compromis- ing an existing VM that is co-resident with the victim.” which is not a Go concern. It also states “The scheduling nu- ances abused are a vulnerability in their own right, enabling degradation-of-service attacks and possibly cycle-stealing at- tacks [44, 50].”

And the counter measures proposed do not use fixed time math. In fact, in none of the five cited papers used, do any propose fixed time math as a solution, and those that discuss it state the difficulties in doing it with varying cpu and cache architectures.

As an aside, the third paper is pretty hokey as it assumes the other vm is performing only or near only cryto work. The chances of the attack in this paper working outside of a lab are astronomically small.

Furthermore, most realworld systems should have the crypto work performed by an isolated machine or device, so then you are back to an external attack, for which time quantization works.

As I said, security is holistic.

@josharian
Copy link
Contributor

I am extremely hesitant to add to this thread, but...

  1. @robaho you write "in an environment like Go" and "modern platform environment" and "modern multi processor multi core system" and "deployed behind a network with multiple hops" and "running in a vm" and "a random machine with random hardware" and "the machine has a highly random load factor". Go is used in many, many environments. I have personally written production Go code that used crypto/* for which basically none of those assumptions hold true.

  2. @robaho I think your position is eminently clear to everyone reading this thread. Further discussion here now is extremely unlikely to be fruitful. And there are more appropriate fora for engaging with the literature on cryptographic attacks.

@creker
Copy link

creker commented Dec 3, 2018

@robaho second paper clearly states that constant-time exponentiation prevents the attack in question and similar ones.

Like FLUSH+RELOAD,
these side-channel attacks rely on data flow from secret
exponent bits to memory access patterns. These attacks
can be prevented by using constant time exponentiation

The pattern of accesses to memory lines of
the OpenSSL [41] implementation of RSA exponentiation
is not dependent on secret exponent bits. Consequently,
even though the implementation is not constant
time [11], it is not vulnerable to our attack.

Your quote is out of context. The problem they describe is that constant time math can only be used in crypto code and does not apply to other software

FLUSH+RELOAD
can be applied no extract secret data from non
cryptographic software. For such software, the performance
costs of constant-time computation are unreasonable,
hence other solutions are required

Your quote describes the general problem of vulnerable x86 instruction that can be fixed properly only by hardware fix. It doesn't mean constant time math doesn't fix it.

@robaho
Copy link

robaho commented Dec 3, 2018

Also, @conradoplg my understanding of the cache timing attack may have been slightly restrictive, but I stated very generically that is it based on timing differences due to the use of the cache. These attacks are doing that as well, by timing their own methods, not the crypto code, and looking for differences to gain knowledge of the crypto state.

I really do appreciate you taking the time to provide those papers. They were very interesting. Thank you.

@robaho
Copy link

robaho commented Dec 3, 2018

@josharian I get it. This will be my last post on this. @creker as your second statement makes clear, OpenSSL was not subject to the attack for a different reason. That is what I’ve been proposing, using alternatives to fixed time math because it will be very difficult to make happen and VERIFY the way Go is designed. To solve these issues in the crypto layer, not the math library. In the second paper, they make changes to exponent code within the crypto, not change the underlying exponent code in the math library.

@creker you were of course correct, the quote was out of context, as the mitigation section states, that obscuring the memory access patterns thwarts the attack. Sorry.

But still, you all seem focused on your solution, have at it, but when it only works reliably on platform X don’t say I didn’t warn you. If anything, create a crypto.math library that is a catch basin for primitives, but it seems that the attacks are more easily thwarted elsewhere in the chain. That is, just because it uses constant time primitives, doesn’t mean the crypto won’t have leaks higher up.

@creker
Copy link

creker commented Dec 3, 2018

@robaho OpenSSL was secure for the very same reason

These attacks
can be prevented by using constant time exponentiation,
where the sequence of instructions and memory locations
accessed are fixed and do not depend on the value of the
exponent bits.

Constant time crypto prescribes that your memory access patterns do not depend on secret data. OpenSSL implementation is only half of the equation but it's enough to prevent the attack. Constant time crypto would also prevent it as it's even more strict than what OpenSSL has.

@gopherbot
Copy link

Change https://golang.org/cl/326012 mentions this issue: crypto/rsa: replace big.Int for encryption and decryption

@rsc
Copy link
Contributor

rsc commented May 4, 2022

In the time that has passed since this proposal was filed, we've moved toward making crypto less and less dependent on math/big, so that math/big can move outside the security perimeter. So it is probably time to officially decline this proposal.

@rsc rsc moved this from Incoming to Declined in Proposals (old) May 4, 2022
@rsc
Copy link
Contributor

rsc commented May 4, 2022

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed May 4, 2022
gopherbot pushed a commit that referenced this issue Nov 19, 2022
Infamously, big.Int does not provide constant-time arithmetic, making
its use in cryptographic code quite tricky. RSA uses big.Int
pervasively, in its public API, for key generation, precomputation, and
for encryption and decryption. This is a known problem. One mitigation,
blinding, is already in place during decryption. This helps mitigate the
very leaky exponentiation operation. Because big.Int is fundamentally
not constant-time, it's unfortunately difficult to guarantee that
mitigations like these are completely effective.

This patch removes the use of big.Int for encryption and decryption,
replacing it with an internal nat type instead. Signing and verification
are also affected, because they depend on encryption and decryption.

Overall, this patch degrades performance by 55% for private key
operations, and 4-5x for (much faster) public key operations.
(Signatures do both, so the slowdown is worse than decryption.)

name                    old time/op  new time/op    delta
DecryptPKCS1v15/2048-8  1.50ms ± 0%    2.34ms ± 0%    +56.44%  (p=0.000 n=8+10)
DecryptPKCS1v15/3072-8  4.40ms ± 0%    6.79ms ± 0%    +54.33%  (p=0.000 n=10+9)
DecryptPKCS1v15/4096-8  9.31ms ± 0%   15.14ms ± 0%    +62.60%  (p=0.000 n=10+10)
EncryptPKCS1v15/2048-8  8.16µs ± 0%  355.58µs ± 0%  +4258.90%  (p=0.000 n=10+9)
DecryptOAEP/2048-8      1.50ms ± 0%    2.34ms ± 0%    +55.68%  (p=0.000 n=10+9)
EncryptOAEP/2048-8      8.51µs ± 0%  355.95µs ± 0%  +4082.75%  (p=0.000 n=10+9)
SignPKCS1v15/2048-8     1.51ms ± 0%    2.69ms ± 0%    +77.94%  (p=0.000 n=10+10)
VerifyPKCS1v15/2048-8   7.25µs ± 0%  354.34µs ± 0%  +4789.52%  (p=0.000 n=9+9)
SignPSS/2048-8          1.51ms ± 0%    2.70ms ± 0%    +78.80%  (p=0.000 n=9+10)
VerifyPSS/2048-8        8.27µs ± 1%  355.65µs ± 0%  +4199.39%  (p=0.000 n=10+10)

Keep in mind that this is without any assembly at all, and that further
improvements are likely possible. I think having a review of the logic
and the cryptography would be a good idea at this stage, before we
complicate the code too much through optimization.

The bulk of the work is in nat.go. This introduces two new types: nat,
representing natural numbers, and modulus, representing moduli used in
modular arithmetic.

A nat has an "announced size", which may be larger than its "true size",
the number of bits needed to represent this number. Operations on a nat
will only ever leak its announced size, never its true size, or other
information about its value. The size of a nat is always clear based on
how its value is set. For example, x.mod(y, m) will make the announced
size of x match that of m, since x is reduced modulo m.

Operations assume that the announced size of the operands match what's
expected (with a few exceptions). For example, x.modAdd(y, m) assumes
that x and y have the same announced size as m, and that they're reduced
modulo m.

Nats are represented over unsatured bits.UintSize - 1 bit limbs. This
means that we can't reuse the assembly routines for big.Int, which use
saturated bits.UintSize limbs. The advantage of unsaturated limbs is
that it makes Montgomery multiplication faster, by needing fewer
registers in a hot loop. This makes exponentiation faster, which
consists of many Montgomery multiplications.

Moduli use nat internally. Unlike nat, the true size of a modulus always
matches its announced size. When creating a modulus, any zero padding is
removed. Moduli will also precompute constants when created, which is
another reason why having a separate type is desirable.

Updates #20654

Co-authored-by: Filippo Valsorda <filippo@golang.org>
Change-Id: I73b61f87d58ab912e80a9644e255d552cbadcced
Reviewed-on: https://go-review.googlesource.com/c/go/+/326012
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Joedian Reid <joedian@golang.org>
@golang golang locked and limited conversation to collaborators May 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Proposal Proposal-Crypto Proposal related to crypto packages or other security issues
Projects
No open projects
Development

No branches or pull requests