crypto: new assembly policy #37168

FiloSottile · 2020-02-11T18:00:26Z

The crypto packages are unfortunately rich in assembly implementations, which are especially hard to review and maintain due to the diversity of architectures, and to the complexity of the optimizations.

Our current assembly policy at golang.org/wiki/AssemblyPolicy has failed to make reviews manageable with the resources the security team currently has. https://go-review.googlesource.com/q/hashtag:crypto-assembly

The result is suboptimal for everyone: implementers have to follow more rules, but reviewers still can’t effectively review their CLs, and no one is happy.

I am proposing a new, much stricter policy. This acknowledges the reality that assembly reviews are currently not moving at an acceptable pace, and shifts more of the load on the implementers, but with a promise that their work won’t go wasted in the review queue. It should also progressively increase the maintainability, reliability and security of the assembly codebases, as well as surface improvement areas for the compiler so that the assembly can be eventually removed.

This policy would apply to all packages in crypto/... and golang.org/x/crypto/.... Due to its use in cryptographic packages, and to the fact that it's partially maintained by the security team, this policy would also extend to math/big.

We prefer portable Go, not assembly. Code in assembly means (N packages * M architectures) to maintain, rather than just N packages.
Minimize use of assembly. We'd rather have a small amount of assembly for a 50% speedup rather than twice as much assembly for a 55% speedup. Explain the decision to place the assembly/Go boundary where it is in the commit message, and support it with benchmarks.
Use higher level programs to generate assembly, either standalone Go programs or go get-able programs, like avo. Output of other reproducible processes (like formally verified code generators) will also be considered. Discuss the implementation strategy on the issue tracker in advance.
Use small, testable units (25–75 lines) called from higher-level logic written in Go. If using small, testable units called from logic written in Go is too slow, use small, testable assembly functions with Go-compatible wrappers, so that Go tests can still test the individual units.
Any assembly function needs a reference Go implementation, that’s tested and fuzzed side-by-side with the assembly. Follow golang.org/wiki/TargetSpecific for structure and testing practices.
Document in the Go code why the implementation requires assembly (specific performance benefit, access to instructions, etc), so we can reevaluate as the compiler improves.

The text was updated successfully, but these errors were encountered:

FiloSottile · 2020-02-11T20:35:20Z

/cc a number of crypto assembly implementers: @mundaym @laboger @mengzhuo @ceseo @crvv @TocarIP @aead @vkrasnov @ncw @gtank @bmakam-qdt

mundaym · 2020-02-12T10:54:46Z

In general the policy seems good to me, though I do feel that if a contributor is following the current assembly policy then they would already be meeting the bar of the new one. Maybe we need to expand the advice even more, I'm not sure.

One thing that this proposal doesn't address is the fairly common practice of taking assembly from another project (e.g. OpenSSL) and porting it to Go assembly. I see this in reviews every so often. One concrete thing that I think we should encourage people to do is to document the repositories and, importantly, the exact version control revisions that their code is based on so that it is easier to check for upstream changes. Having said that, when reviewing such code do you think that it is better to encourage keeping the code close to the original so that it is easier to update with upstream changes (assuming anyone ever bothers to check for them) or to modify it more heavily to fit with the policy? Sometimes contributors don't fully understand the implementation or algorithm they are porting and so it will be very hard for them to rework it to fit with the policy. Or they may just be reluctant to change the code too much. How firm should we be in that sort of situation, assuming it is otherwise well tested?

One other point I'd like to make is that I don't think we should mandate the use of code generation. It should only be used when it would increase maintainability. Short hand-written assembly files can be preferable in some situations.

bwesterb · 2020-02-12T11:19:25Z

Yes, I agree, I switched from assembly in go-ristretto to using plain Go for Go 1.13 as it has bits.Mul64. Performance is a bit worse, but an acceptable trade-off for the increased maintainability. It would be nice to have more "intrinsics" support in Go, like Rust's u32x4, etc. Otherwise I don't see how to reach acceptable performance for hashes.

josharian · 2020-02-12T14:49:52Z

@bwesterb please do file issues when you encounter specific intrinsics that are important for eliminating assembly. There’s uncertainty around how vector intrinsics should work, but I think there’s openness to new scalar intrinsics.

Edit: just realized you were asking about a vector intrinsic. Sorry. I’d personally like to have those, too, but no one has put together a complete design yet. There’s some prior discussion in another issue I can’t find at the moment.

rsc · 2020-02-12T18:20:32Z

I seem to remember a discussion when we wrote https://github.com/golang/go/wiki/AssemblyPolicy (only 18 months ago). Does anyone know if there's a link somewhere to that discussion?

FiloSottile · 2020-02-12T22:00:59Z

In general the policy seems good to me, though I do feel that if a contributor is following the current assembly policy then they would already be meeting the bar of the new one. Maybe we need to expand the advice even more, I'm not sure.

Hmm, the major differences should be a stronger preference for code generation, and the requirement for a Go reference implementation that matches small assembly functions 1:1. For example, there are generic implementations that do 32-bit limbs while the assembly does 64-bit limbs, and that would not be acceptable anymore, or assembly where the only entry point is ScalarMult, which would have to be broken up in a dozen or so functions.

One thing that this proposal doesn't address is the fairly common practice of taking assembly from another project (e.g. OpenSSL) and porting it to Go assembly. I see this in reviews every so often. One concrete thing that I think we should encourage people to do is to document the repositories and, importantly, the exact version control revisions that their code is based on so that it is easier to check for upstream changes. Having said that, when reviewing such code do you think that it is better to encourage keeping the code close to the original so that it is easier to update with upstream changes (assuming anyone ever bothers to check for them) or to modify it more heavily to fit with the policy? Sometimes contributors don't fully understand the implementation or algorithm they are porting and so it will be very hard for them to rework it to fit with the policy. Or they may just be reluctant to change the code too much. How firm should we be in that sort of situation, assuming it is otherwise well tested?

I should mention that by default, this is not ok license-wise. There's a path to bring in assembly that was dual-licensed as part of cryptogams, and that's what was taken a couple times.

I think we tried the "keep the port close to the source" route, and it does not meet the standards of maintainability we'd like to have, also because the source is often in perlasm or something like that, and the calling conventions are different, and Go has reserved registers, etc.

I'd like us to try developing a small and curated set of assembly implementations rather than bring in large amounts from other projects with different priorities.

One other point I'd like to make is that I don't think we should mandate the use of code generation. It should only be used when it would increase maintainability. Short hand-written assembly files can be preferable in some situations.

This is particularly experimental, and I'm willing to reconsider after we've seen some results, but I'd like to try it out, as I think it might lead to much better readability and portability. Maybe it will even force the creation of more tools like avo that can abstract some of the useless manual work.

aead · 2020-02-12T23:25:11Z

Should the new assembly policy make any statement about when assembler code eventually gets removed or is very unlikely to get merged?

Probably it's not worth further maintaining M asm implementations of a "broken" cryptographic primitive. One example, I remember, has been the crypto/rc4 package. See: #25417 and https://go-review.googlesource.com/c/go/+/130397/

Similarly, we may not want to maintain more asm code for "outdated" primitives - i.e. SHA-1.
(Yeah, technically SHA-1 is broken as well - but it's still commonly used b/c legacy, git, ...). For such primitives it seems unlikely that new asm code is worth the maintenance cost.

However, both aspects are quite specific for cryptographic implementations. So they may be to specific for a general assembly policy.

as · 2020-02-13T02:38:12Z

Use higher level programs to generate assembly, either standalone Go programs or go get-able programs, like avo.

Can we clarify how "high level programs to generate assembly" make the code more secure? If anything, I would think they require more scrutiny. The author of a program that calls a third-party program to generate assembly has no idea why the program generated it the way it did. They need to understand their implementation in their high level language as well as review and understand the low-level assembly generated by the third-party program. It is a mistake to delegate trust to a third party code generator to generate assembly for a unique cryptographic primitive.

Output of other reproducible processes (like formally verified code generators) will also be considered.

Which processes? What is the context of this statement? Are we talking about guarantees that the instructions aren't corrupting memory or verifying that the implementation itself is doing the right thing for all possible inputs. The latter seems impossible, and the intent of formal verification should be clearly stated in the proposal to reduce confusion.

thepudds · 2020-02-13T18:01:34Z

Hi @FiloSottile, FWIW I like the general philosophy here around active experimentation to see if a new approach yields better results, and it sounds like the concept is to observe and refine based on how the experiment goes.

A few questions on the fuzzing piece:

Would it make sense to require hooking up to oss-fuzz or similar for on-going fuzzing?
Would it help to include somewhere some sample template fuzzing functions, including side-by-side fuzzing, or perhaps instead include some pointers to some good concrete examples? (Maybe this would not be part of the policy itself, but perhaps on an auxiliary wiki page or similar).
Would it make sense to recommend or possibly require fuzzing with > 1 engine (such as native dvyukov/go-fuzz as well as libFuzzer) given different engines often surface different bugs? In other words, the policy wouldn't necessarily need to say you must use engine X and Y, but could say the count of engines used should be > 1. (Or maybe that is setting too high of a bar).

mmcloughlin · 2020-02-13T22:13:46Z

Can we clarify how "high level programs to generate assembly" make the code more secure? If anything, I would think they require more scrutiny. The author of a program that calls a third-party program to generate assembly has no idea why the program generated it the way it did. They need to understand their implementation in their high level language as well as review and understand the low-level assembly generated by the third-party program. It is a mistake to delegate trust to a third party code generator to generate assembly for a unique cryptographic primitive.

Firstly a disclaimer: I wrote avo. I'm pleased to see it mentioned in this proposal as one of many tools available to help reduce complxeity in high performance crypto assembly. That said, I'm not naive enough to think it's always the right tool for the job!

The claim is that code generation makes code easier to write, review and maintain, ultimately making bugs less likely. It allows us to assign meaningful names to variables, and use control structures to handle the repetitive block structures that are common in cryptographic primitives (avo SHA-1 example). I would claim that concise code generators written this way are more likely to be reviewable than thousands of lines of assembly (see p256 and AES-GCM).

I would also point out that this isn't a novel idea, in fact code generation is already used extensively in crypto assembly, specifially via the preprocessor macros.

Preprocessor being used for "register allocation" in AES-GCM:

go/src/crypto/aes/gcm_amd64.s

Lines 14 to 21 in 3eab754

    
           #define B0 X0 
        
           #define B1 X1 
        
           #define B2 X2 
        
           #define B3 X3 
        
           #define B4 X4 
        
           #define B5 X5 
        
           #define B6 X6 
        
           #define B7 X7

Here's a poor-man's "function" defined with the preprocessor:

go/src/crypto/aes/gcm_amd64.s

Lines 264 to 275 in 3eab754

    
           #define mulRoundAAD(X ,i) \ 
        
           	MOVOU (16*(i*2))(pTbl), T1;\ 
        
           	MOVOU T1, T2;\ 
        
           	PCLMULQDQ $0x00, X, T1;\ 
        
           	PXOR T1, ACC0;\ 
        
           	PCLMULQDQ $0x11, X, T2;\ 
        
           	PXOR T2, ACC1;\ 
        
           	PSHUFD $78, X, T1;\ 
        
           	PXOR T1, X;\ 
        
           	MOVOU (16*(i*2+1))(pTbl), T1;\ 
        
           	PCLMULQDQ $0x00, X, T1;\ 
        
           	PXOR T1, ACCM

Likewise in P-256:

go/src/crypto/elliptic/p256_asm_amd64.s

Lines 18 to 25 in 3eab754

    
           #define acc0 R8 
        
           #define acc1 R9 
        
           #define acc2 R10 
        
           #define acc3 R11 
        
           #define acc4 R12 
        
           #define acc5 R13 
        
           #define t0 R14 
        
           #define t1 R15

go/src/crypto/elliptic/p256_asm_amd64.s

Lines 1664 to 1683 in 3eab754

    
           #define p256MulBy2Inline\ 
        
           	XORQ mul0, mul0;\ 
        
           	ADDQ acc4, acc4;\ 
        
           	ADCQ acc5, acc5;\ 
        
           	ADCQ acc6, acc6;\ 
        
           	ADCQ acc7, acc7;\ 
        
           	ADCQ $0, mul0;\ 
        
           	MOVQ acc4, t0;\ 
        
           	MOVQ acc5, t1;\ 
        
           	MOVQ acc6, t2;\ 
        
           	MOVQ acc7, t3;\ 
        
           	SUBQ $-1, t0;\ 
        
           	SBBQ p256const0<>(SB), t1;\ 
        
           	SBBQ $0, t2;\ 
        
           	SBBQ p256const1<>(SB), t3;\ 
        
           	SBBQ $0, mul0;\ 
        
           	CMOVQCS acc4, t0;\ 
        
           	CMOVQCS acc5, t1;\ 
        
           	CMOVQCS acc6, t2;\ 
        
           	CMOVQCS acc7, t3;

Again in blake2b

https://github.com/golang/crypto/blob/86ce3cb696783b739e41e834e2eead3e1b4aa3fb/blake2b/blake2b_amd64.s#L33-L40

I could go on. My point is that code generation is already being used by Go crypto in some sense, it's just the pre-processor is a particularly awful form of it (in my opinion). And Go isn't alone here: OpenSSL's perlasm isn't something we'd like to copy, but it's further evidence of code generation in this space. Likewise qhasm in SUPERCOP/NaCL and other projects. I think there's a case to be made that the proposed assembly policy isn't saying that much new, it's just encouraging people to use better tools than the Go assembly preprocessor.

Your point that it's also necessary to review the code generator itself is completely fair and correct. I'd make a few points in response to this, some general, some defending avo in particular.

You don't have to use a third-party package at all. It can simply be a make_asm.go program that prints assembly output. Perhaps something like this meow hash assembly generator. It can be even simpler than this as well. There's nothing prescriptive about this policy, only bring in a third-party package if you think it's warranted.
avo isn't that complicated. I'm proud of the project, but I wouldn't never claim it was rocket science. I have deliberately avoided features that would drift into "compiler" territory. It contains a Go wrapper around every x86 instruction as well as a register allocator. Most of the complexity lies in the register allocation (and associated passes), and that logic ultimately isn't that much code. Apart from that it's a cleaner way to print assembly isntructions, and allows you to use Go functions and control structures to structure your code generator.
avo is as complex as you want it to be. If you're suspicious of the register allocator and associated complexity, you don't have to use it. avo supports just using physical registers directly.
When done well, the assembly is reviewable too. I wouldn't claim this was always possible, but tools like avo produce assembly output (they are not assemblers). You can insert comments in the assembly output as well, and it'll format it nicely, like asmfmt. When this is done carefully, the output actually looks quite nice. If you can achieve this you are in a strictly better situation than you were before: you can still review the raw assembly but you have the code generator alongside it too.

Which processes? What is the context of this statement? Are we talking about guarantees that the instructions aren't corrupting memory or verifying that the implementation itself is doing the right thing for all possible inputs. The latter seems impossible, and the intent of formal verification should be clearly stated in the proposal to reduce confusion.

I can't speak for @FiloSottile, but based on a conversation we had at Real World Crypto I have some context. It seems that after the HACS workshop there is some chance that the fiat-crypto team will add support for Go output. If so this could be a great outcome for the Go community, and I assume the intention was for the assembly policy to encompass auto-generated code from formally verified tooling.

@FiloSottile I appreciate you putting this together and I'm in broad agreement. I'm glad to see the emphasis that this is an experiment -- we can learn as we go. I wonder if it might be a good idea for us to concretize this in some specific cases, to see how it might play out. Perhaps we could pick one or more of the critical primitives and ask ourselves how we would have implemented it under the proposed policy (for example P-256, AES-GCM, curve/ed25519). By the way I'm not suggesting we write any code, just that we sketch out what a policy-compliant implementation would look like.

ncw · 2020-02-16T15:40:50Z

I am proposing a new, much stricter policy. This acknowledges the reality that assembly reviews are currently not moving at an acceptable pace, and shifts more of the load on the implementers, but with a promise that their work won’t go wasted in the review queue.

I've contributed several ARM crypto implementations. It is work I enjoy and am suited to having written 10s of thousands of lines of ARM assembler in the past. However spending weeks writing assembly code to not have it merged really sucks!

I think this is an excellent initiative. It will require some restraint on the part of the assembly authors who naturally want to go for as fast as possible.

I like the idea of code generation. I've used the preprocessor extensively along with register name definitions which makes life easier. The preprocessor is a poor second to a good macro assembler though.

Alas avo is x86 only at the moment isn't it? I'm not sure there are any ARM tools at the moment (I'd love to be corrected if wrong).

crvv · 2020-03-25T04:56:34Z

As for code generation, There are examples for comparison.

https://go-review.googlesource.com/c/go/+/51670
https://go-review.googlesource.com/c/go/+/136896

The code in CL 51670 is hand-written. The code has the exact same structure with the s390x implementation.
The .s file has 124 lines. I think all lines below line 71 is trivial(they are copied from $GOROOT/src/crypto/aes/asm_amd64.s).
There are some repetitions in the first 70 lines. If a reviewer reads 50 lines in the .s file, I think the code can be understood very well.

The code generator in CL 136896 has 298 lines. It is clear that I need more time to read the 298 lines of Go than those 124 lines of assembly. Besides, there are another 740 lines of assembly code to review.

If I am the reviewer, I need much more time to review CL 136896.
Although the code in CL 51670 is written by me, I still think the point is reasonable for others.
Code generation can be helpful in some situations, but it doesn't fit all situations.

mmcloughlin · 2020-03-25T07:03:35Z

@crvv According to my measurements these two CLs have substantially different performance. See the following gist for the methodology, and please let me know if I got something wrong.

https://gist.github.com/mmcloughlin/70598689ad2da5c6aadf19e30b6642a1

Here's the benchcmp output between the two CLs:

benchmark                   old ns/op     new ns/op     delta
BenchmarkAESGCMSeal1K-8     556           558           +0.36%
BenchmarkAESGCMOpen1K-8     530           534           +0.75%
BenchmarkAESGCMSign8K-8     2049          2045          -0.20%
BenchmarkAESGCMSeal8K-8     3366          3362          -0.12%
BenchmarkAESGCMOpen8K-8     3284          3289          +0.15%
BenchmarkAESCTR1K-8         671           376           -43.96%
BenchmarkAESCTR8K-8         4826          2294          -52.47%

benchmark                   old MB/s     new MB/s     speedup
BenchmarkAESGCMSeal1K-8     1840.43      1834.91      1.00x
BenchmarkAESGCMOpen1K-8     1929.99      1917.47      0.99x
BenchmarkAESGCMSign8K-8     3997.98      4005.84      1.00x
BenchmarkAESGCMSeal8K-8     2433.57      2436.21      1.00x
BenchmarkAESGCMOpen8K-8     2494.19      2490.21      1.00x
BenchmarkAESCTR1K-8         1516.47      2704.17      1.78x
BenchmarkAESCTR8K-8         1696.40      3568.81      2.10x

Note I've included AES-GCM benchmarks partly to show the benchmarks ran in similar environments, but also because it should be possible to write an AES-CTR mode implementation with similar performance to AES-GCM, if not better.

I believe this experiment shows that CL 136896 is approx 2x faster than CL 51670. Therefore this is a classic trade-off. CL 136896 is unquestionably more complicated than 51670, but one could argue the complexity is justified for the speedup.

CL 136896 uses some of the techniques employed in the GCM implementation, producing performance fairly close to GCM for the 8K benchmark. But I would argue that the code generator used in the CL makes it easier to understand than gcm_amd64.s (1286 hand-written lines).

crvv · 2020-03-25T07:59:12Z

There is no doubt that the complex one is faster.
The point is the code generator is not always helpful.
If you write the simple algorithm in CL 51670, I doubt you will use the code generator.
The benefit is small.

As for the speed difference between the two CLs, it depends on the CPU architecture.
CL 136796 is 10%-100% faster than CL 51670.
By the way, on my computers, OpenSSL 1.1.1d is 20%-80% faster than CL 136896.

ncw · 2020-03-25T10:52:57Z

@crvv wrote:

The point is the code generator is not always helpful.

I'd agree in this case. I think you could do everything the code generator does with preprocessor macros in the .s file itself and it would be clear what is going on rather than having to work out how the code generator works then puzzling through how what the code generator did matches the assembler code generated.

I personally think the solution to this is to make the macro system in the go assembler more powerful and then mandate that all assembler code is written with that, rather than third party or ad-hoc code generation tools.

mmcloughlin · 2020-03-25T18:21:17Z

There is no doubt that the complex one is faster.
The point is the code generator is not always helpful.
If you write the simple algorithm in CL 51670, I doubt you will use the code generator.
The benefit is small.

@crvv Yes, fair point and I probably agree that CL 51670 wouldn't benefit from code generation.

I was reacting to your previous comment in which it seemed to me you compared the two CLs as if the only difference was one uses code generation and the other doesn't. One is up to 2x faster than the other, that needs to be mentioned in any comparison.

By the way, on my computers, OpenSSL 1.1.1d is 20%-80% faster than CL 136896.

Yes, CL 136796 isn't optimal either.

rsc · 2020-04-01T15:51:11Z

To clarify, the goal is to present the assembly in the simplest, most reviewable form. Sometimes it is small and that's a direct .s file. Sometimes not, and a generator is better. And yes, one can write difficult-to-understand generators - don't do that. Strive to find the simplest way to express what you need to express. That's true for all code.

I think everyone basically agrees that CL 51670 doesn't need a generator, although I think it's close to the upper limit for that category. And I think basically everyone agrees that CL 136896's generator is much better than the 700 lines of assembly it generates. The question of whether a particular optimization is worth the added complexity is almost always going to be context-dependent. (In that specific case, a factor of two was deemed worthwhile. We're not here to second-guess that.)

rsc · 2020-04-01T15:52:13Z

From reading the discussion, it sounds like people are generally in favor of the policy @FiloSottile put in the comment atop this issue. Do I have that right? Is anyone opposed to this policy?

(There was some discussion about tradeoffs in the specific AES-CTR case, but the policy is not about settling every possible tradeoff.)

mmcloughlin · 2020-04-01T21:55:50Z

Is anyone opposed to this policy?

As far as I can tell from the discussion, there is pushback against the phrasing "Use higher level programs to generate assembly", in that it appears to suggest code generation is required. As you say it seems there is consensus here that for small enough programs code generation may be counterproductive.

Perhaps a slight softening of that language would be good? It would be nice if we could be more specific about when code generation is required, but it seems quite difficult to strictly define.

laboger · 2020-04-02T13:41:21Z

I agree with most of what's written in principle. Use asm only when there is no other way. Write it as clear as possible, document well, test well. Good rules to follow for any asm not just crypto.

I would just like to point out that much of the policy is written with amd64 in mind. There is no avo for others. The performance of generated code for Go is more likely to be worse than asm if not amd64. Writing small functions could result in more call overhead if the functions can't be inlined.

FiloSottile · 2020-04-02T18:02:52Z

Thanks everyone for the input and the discussion.

Re: code generation, this is definitely an experiment, and I agree with Russ that there can be more and less complex generators, and they should be weighted against the complexity they save. They can be as simple as some Go functions named after a specific operation (say, field addition) that take parametrized registries and print assembly, so they can be reused and looped. How about the following language? I think "Discuss the implementation strategy on the issue tracker in advance." gives us space to assess case by case as we go forward.

Use higher level programs to generate non-trivial amounts of assembly, either standalone Go programs or go get-able programs, like avo. Output of other reproducible processes (like formally verified code generators) will also be considered. Discuss the implementation strategy on the issue tracker in advance.

I would just like to point out that much of the policy is written with amd64 in mind. There is no avo for others. The performance of generated code for Go is more likely to be worse than asm if not amd64. Writing small functions could result in more call overhead if the functions can't be inlined.

That's definitely not the intention, as that would only solve a fraction of the maintenance problem, so I would love to hear more.

Asm is going to be faster than compiler-generated amd64 as well, even if the gap might be smaller. The requirement to highlight what the compiler needs to do to replace the asm is meant to fight that gap over time.

When you say small functions do you mean asm or Go? The requirement is specifically for asm functions, and I realize sometimes using the Go ABI is too expensive, so it's ok to make simple jumpable units with test adapters.

dgryski · 2020-04-02T18:34:52Z

Inlining small assembly functions probably falls under #17373 or #26891

rsc · 2020-04-15T17:11:31Z

@FiloSottile, could you please post in a new comment the exact policy that we are now discussing accepting or rejecting. I see an edit a few comments back and just want to make sure everyone agrees how it applies.

@dgryski, note that we won't be inlining arbitrary small assembly functions any time soon. Too much subtlety for the compiler to take on. The path we chose instead was good intrinsified APIs.

surechen · 2020-04-20T09:32:08Z

Hello, if the above policy is passed, does the golang community need to reconstruct some existing assembly implementations with more than 100 lines? Such as splitting into smaller and more assembly or golang functions. Compared with the current assembly implementation, this way may lead to performance degradation. Will this be accepted by the golang community?

FiloSottile · 2020-04-20T20:27:47Z

Hello, if the above policy is passed, does the golang community need to reconstruct some existing assembly implementations with more than 100 lines?

Yes, eventually we do want to adapt current code to the policy, although that's lower priority than draining the current review queue. Like with new code, performance is not the ultimate goal, but I'd like to think that as we gain experience with the policy, the degradation will be small or null.

@FiloSottile, could you please post in a new comment the exact policy that we are now discussing accepting or rejecting. I see an edit a few comments back and just want to make sure everyone agrees how it applies.

Here's the full current text. I also clarified functions vs units.

We prefer portable Go, not assembly. Code in assembly means (N packages * M architectures) to maintain, rather than just N packages.
Minimize use of assembly. We'd rather have a small amount of assembly for a 50% speedup rather than twice as much assembly for a 55% speedup. Explain the decision to place the assembly/Go boundary where it is in the commit message, and support it with benchmarks.
Use higher level programs to generate non-trivial amounts of assembly, either standalone Go programs or go get-able programs, like avo. Output of other reproducible processes (like formally verified code generators) will also be considered. Discuss the implementation strategy on the issue tracker in advance.
Use small, testable units (25–75 lines) called from higher-level logic written in Go. If using small, testable functions called from logic written in Go is too slow, use small, testable assembly units with Go-compatible wrappers, so that Go tests can still test the individual units.
Any assembly function needs a reference Go implementation, that’s tested and fuzzed side-by-side with the assembly. Follow golang.org/wiki/TargetSpecific for structure and testing practices.
Document in the Go code why the implementation requires assembly (specific performance benefit, access to instructions, etc), so we can reevaluate as the compiler improves.

rsc · 2020-04-22T17:14:52Z

Does anyone object to the policy in the preceding comment by @FiloSottile?

ncw · 2020-04-23T19:49:23Z

I think the policy looks good.

It might be worth mentioning here that the output of the assembly generators needs to be checked in and have a comment at the start like // Code generated by programName. DO NOT EDIT

Use higher level programs to generate non-trivial amounts of assembly, either standalone Go programs or go get-able programs, like avo. Output of other reproducible processes (like formally verified code generators) will also be considered. Discuss the implementation strategy on the issue tracker in advance.

as · 2020-04-29T02:20:49Z

Use higher level programs to generate non-trivial amounts of assembly, either standalone Go programs or go get-able programs, like avo. Output of other reproducible processes (like formally verified code generators) will also be considered. Discuss the implementation strategy on the issue tracker in advance.

How much assembly quantifies a non-trivial amount?

rsc · 2020-04-29T17:16:54Z

Worth noting that @ncw's comment is in the go help generate output but would be good to repeat in this policy.

@as, non-trivial will always be in the eye of the reviewer. Most of the time it's easy to decide. Discussion on the issue tracker in advance will help.

rsc · 2020-04-29T17:17:05Z

Based on the discussion above, this seems like a likely accept.

rsc · 2020-05-06T17:19:03Z

No change in consensus, so accepted.

FiloSottile · 2020-07-23T06:41:12Z

This is now golang.org/wiki/AssemblyPolicy and will apply to the Go 1.16 cycle.

gopherbot added this to the Proposal milestone Feb 11, 2020

gopherbot added the Proposal label Feb 11, 2020

FiloSottile added the Proposal-Crypto label Feb 11, 2020

rsc mentioned this issue Feb 12, 2020

proposal: review meeting minutes #33502

Open

FiloSottile mentioned this issue Feb 12, 2020

wiki: clarify Assembly Policy for crypto #28126

Closed

rsc changed the title ~~proposal: new cryptographic assembly policy~~ proposal: crypto: new assembly policy Mar 4, 2020

drakkan mentioned this issue Apr 11, 2020

Relatively lower performance than OpenSSH drakkan/sftpgo#69

Closed

rsc added the Proposal-FinalCommentPeriod label May 1, 2020

rsc modified the milestones: Proposal, Backlog May 6, 2020

rsc changed the title ~~proposal: crypto: new assembly policy~~ crypto: new assembly policy May 6, 2020

rsc added Proposal-Accepted and removed Proposal-FinalCommentPeriod labels May 6, 2020

FiloSottile closed this as completed Jul 23, 2020

golang locked and limited conversation to collaborators Jul 23, 2021

gopherbot added the FrozenDueToAge label Jul 23, 2021

rsc moved this to Accepted in Proposals Aug 10, 2022

rsc added this to Proposals Aug 10, 2022

rsc removed this from Proposals Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crypto: new assembly policy #37168

crypto: new assembly policy #37168

FiloSottile commented Feb 11, 2020

FiloSottile commented Feb 11, 2020

mundaym commented Feb 12, 2020 •

edited

Loading

bwesterb commented Feb 12, 2020

josharian commented Feb 12, 2020 •

edited

Loading

rsc commented Feb 12, 2020

FiloSottile commented Feb 12, 2020

aead commented Feb 12, 2020

as commented Feb 13, 2020 •

edited

Loading

thepudds commented Feb 13, 2020 •

edited

Loading

mmcloughlin commented Feb 13, 2020

ncw commented Feb 16, 2020

crvv commented Mar 25, 2020

mmcloughlin commented Mar 25, 2020

crvv commented Mar 25, 2020

ncw commented Mar 25, 2020

mmcloughlin commented Mar 25, 2020

rsc commented Apr 1, 2020

rsc commented Apr 1, 2020

mmcloughlin commented Apr 1, 2020

laboger commented Apr 2, 2020

FiloSottile commented Apr 2, 2020

dgryski commented Apr 2, 2020 •

edited

Loading

rsc commented Apr 15, 2020

surechen commented Apr 20, 2020

FiloSottile commented Apr 20, 2020

rsc commented Apr 22, 2020

ncw commented Apr 23, 2020

as commented Apr 29, 2020

rsc commented Apr 29, 2020

rsc commented Apr 29, 2020

rsc commented May 6, 2020

FiloSottile commented Jul 23, 2020

crypto: new assembly policy #37168

crypto: new assembly policy #37168

Comments

FiloSottile commented Feb 11, 2020

FiloSottile commented Feb 11, 2020

mundaym commented Feb 12, 2020 • edited Loading

bwesterb commented Feb 12, 2020

josharian commented Feb 12, 2020 • edited Loading

rsc commented Feb 12, 2020

FiloSottile commented Feb 12, 2020

aead commented Feb 12, 2020

as commented Feb 13, 2020 • edited Loading

thepudds commented Feb 13, 2020 • edited Loading

mmcloughlin commented Feb 13, 2020

ncw commented Feb 16, 2020

crvv commented Mar 25, 2020

mmcloughlin commented Mar 25, 2020

crvv commented Mar 25, 2020

ncw commented Mar 25, 2020

mmcloughlin commented Mar 25, 2020

rsc commented Apr 1, 2020

rsc commented Apr 1, 2020

mmcloughlin commented Apr 1, 2020

laboger commented Apr 2, 2020

FiloSottile commented Apr 2, 2020

dgryski commented Apr 2, 2020 • edited Loading

rsc commented Apr 15, 2020

surechen commented Apr 20, 2020

FiloSottile commented Apr 20, 2020

rsc commented Apr 22, 2020

ncw commented Apr 23, 2020

as commented Apr 29, 2020

rsc commented Apr 29, 2020

rsc commented Apr 29, 2020

rsc commented May 6, 2020

FiloSottile commented Jul 23, 2020

mundaym commented Feb 12, 2020 •

edited

Loading

josharian commented Feb 12, 2020 •

edited

Loading

as commented Feb 13, 2020 •

edited

Loading

thepudds commented Feb 13, 2020 •

edited

Loading

dgryski commented Apr 2, 2020 •

edited

Loading