Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(340)

Issue 7547050: code review 7547050: crypto/rc4: faster amd64, 386 implementations (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
12 years ago by rsc
Modified:
12 years ago
Reviewers:
CC:
agl1, golang-dev
Visibility:
Public.

Description

crypto/rc4: faster amd64, 386 implementations -- amd64 -- On a MacBookPro10,2 (Core i5): benchmark old ns/op new ns/op delta BenchmarkRC4_128 470 421 -10.43% BenchmarkRC4_1K 3123 3275 +4.87% BenchmarkRC4_8K 26351 25866 -1.84% benchmark old MB/s new MB/s speedup BenchmarkRC4_128 272.22 303.40 1.11x BenchmarkRC4_1K 327.80 312.58 0.95x BenchmarkRC4_8K 307.24 313.00 1.02x For comparison, on the same machine, openssl 0.9.8r reports its rc4 speed as somewhat under 350 MB/s for both 1K and 8K. The Core i5 performance can be boosted another 20%, but only by making the Xeon performance significantly slower. On an Intel Xeon E5520: benchmark old ns/op new ns/op delta BenchmarkRC4_128 774 417 -46.12% BenchmarkRC4_1K 6121 3200 -47.72% BenchmarkRC4_8K 48394 25151 -48.03% benchmark old MB/s new MB/s speedup BenchmarkRC4_128 165.18 306.84 1.86x BenchmarkRC4_1K 167.28 319.92 1.91x BenchmarkRC4_8K 167.29 321.89 1.92x For comparison, on the same machine, openssl 1.0.1 (which uses a different implementation than 0.9.8r) reports its rc4 speed as 587 MB/s for 1K and 601 MB/s for 8K. It is using SIMD instructions to do more in parallel. So there's still some improvement to be had, but even so, this is almost 2x faster than what it replaced. -- 386 -- On a MacBookPro10,2 (Core i5): benchmark old ns/op new ns/op delta BenchmarkRC4_128 3491 421 -87.94% BenchmarkRC4_1K 28063 3205 -88.58% BenchmarkRC4_8K 220392 25228 -88.55% benchmark old MB/s new MB/s speedup BenchmarkRC4_128 36.66 303.81 8.29x BenchmarkRC4_1K 36.49 319.42 8.75x BenchmarkRC4_8K 36.73 320.90 8.74x On an Intel Xeon E5520: benchmark old ns/op new ns/op delta BenchmarkRC4_128 2268 524 -76.90% BenchmarkRC4_1K 18161 4137 -77.22% BenchmarkRC4_8K 142396 32350 -77.28% benchmark old MB/s new MB/s speedup BenchmarkRC4_128 56.42 244.13 4.33x BenchmarkRC4_1K 56.38 247.46 4.39x BenchmarkRC4_8K 56.86 250.26 4.40x

Patch Set 1 #

Patch Set 2 : diff -r 837570b3d8bd https://code.google.com/p/go/ #

Patch Set 3 : diff -r 837570b3d8bd https://code.google.com/p/go/ #

Patch Set 4 : diff -r 837570b3d8bd https://go.googlecode.com/hg/ #

Patch Set 5 : diff -r 536b2cc91dfc https://go.googlecode.com/hg/ #

Patch Set 6 : diff -r c246dbf446db https://code.google.com/p/go/ #

Patch Set 7 : diff -r c246dbf446db https://code.google.com/p/go/ #

Total comments: 2

Patch Set 8 : diff -r ff779e477085 https://code.google.com/p/go/ #

Patch Set 9 : diff -r ff779e477085 https://code.google.com/p/go/ #

Unified diffs Side-by-side diffs Delta from patch set Stats (+168 lines, -93 lines) Patch
M src/pkg/crypto/rc4/rc4_386.s View 1 2 3 4 5 6 7 1 chunk +33 lines, -35 lines 0 comments Download
M src/pkg/crypto/rc4/rc4_amd64.s View 1 2 3 4 5 1 chunk +97 lines, -44 lines 0 comments Download
M src/pkg/crypto/rc4/rc4_asm.go View 1 2 3 4 5 1 chunk +1 line, -1 line 0 comments Download
M src/pkg/crypto/rc4/rc4_ref.go View 1 2 3 4 5 1 chunk +1 line, -1 line 0 comments Download
M src/pkg/crypto/rc4/rc4_test.go View 1 2 3 4 5 6 2 chunks +36 lines, -12 lines 0 comments Download

Messages

Total messages: 5
rsc
Hello agl@golang.org (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://code.google.com/p/go/
12 years ago (2013-03-21 05:30:57 UTC) #1
agl1
This seems to fix an issue with the old amd64 asm on older Intel CPUs, ...
12 years ago (2013-03-21 11:11:59 UTC) #2
rsc
On Thu, Mar 21, 2013 at 7:11 AM, <agl@golang.org> wrote: > This seems to fix ...
12 years ago (2013-03-21 13:51:21 UTC) #3
agl1
On Thu, Mar 21, 2013 at 9:51 AM, Russ Cox <rsc@golang.org> wrote: > By an ...
12 years ago (2013-03-21 14:25:07 UTC) #4
rsc
12 years ago (2013-03-21 15:25:15 UTC) #5
*** Submitted as https://code.google.com/p/go/source/detail?r=97e26e028a6e ***

crypto/rc4: faster amd64, 386 implementations

-- amd64 --

On a MacBookPro10,2 (Core i5):

benchmark           old ns/op    new ns/op    delta
BenchmarkRC4_128          470          421  -10.43%
BenchmarkRC4_1K          3123         3275   +4.87%
BenchmarkRC4_8K         26351        25866   -1.84%

benchmark            old MB/s     new MB/s  speedup
BenchmarkRC4_128       272.22       303.40    1.11x
BenchmarkRC4_1K        327.80       312.58    0.95x
BenchmarkRC4_8K        307.24       313.00    1.02x

For comparison, on the same machine, openssl 0.9.8r reports
its rc4 speed as somewhat under 350 MB/s for both 1K and 8K.
The Core i5 performance can be boosted another 20%, but only
by making the Xeon performance significantly slower.

On an Intel Xeon E5520:

benchmark           old ns/op    new ns/op    delta
BenchmarkRC4_128          774          417  -46.12%
BenchmarkRC4_1K          6121         3200  -47.72%
BenchmarkRC4_8K         48394        25151  -48.03%

benchmark            old MB/s     new MB/s  speedup
BenchmarkRC4_128       165.18       306.84    1.86x
BenchmarkRC4_1K        167.28       319.92    1.91x
BenchmarkRC4_8K        167.29       321.89    1.92x

For comparison, on the same machine, openssl 1.0.1
(which uses a different implementation than 0.9.8r)
reports its rc4 speed as 587 MB/s for 1K and 601 MB/s for 8K.
It is using SIMD instructions to do more in parallel.

So there's still some improvement to be had, but even so,
this is almost 2x faster than what it replaced.

-- 386 --

On a MacBookPro10,2 (Core i5):

benchmark           old ns/op    new ns/op    delta
BenchmarkRC4_128         3491          421  -87.94%
BenchmarkRC4_1K         28063         3205  -88.58%
BenchmarkRC4_8K        220392        25228  -88.55%

benchmark            old MB/s     new MB/s  speedup
BenchmarkRC4_128        36.66       303.81    8.29x
BenchmarkRC4_1K         36.49       319.42    8.75x
BenchmarkRC4_8K         36.73       320.90    8.74x

On an Intel Xeon E5520:

benchmark           old ns/op    new ns/op    delta
BenchmarkRC4_128         2268          524  -76.90%
BenchmarkRC4_1K         18161         4137  -77.22%
BenchmarkRC4_8K        142396        32350  -77.28%

benchmark            old MB/s     new MB/s  speedup
BenchmarkRC4_128        56.42       244.13    4.33x
BenchmarkRC4_1K         56.38       247.46    4.39x
BenchmarkRC4_8K         56.86       250.26    4.40x

R=agl
CC=golang-dev
https://codereview.appspot.com/7547050

https://codereview.appspot.com/7547050/diff/16001/src/pkg/crypto/rc4/rc4_386.s
File src/pkg/crypto/rc4/rc4_386.s (right):

https://codereview.appspot.com/7547050/diff/16001/src/pkg/crypto/rc4/rc4_386....
src/pkg/crypto/rc4/rc4_386.s:1: // Copyright 2013 The Go Authors. All rights
reserved.
On 2013/03/21 11:11:59, agl1 wrote:
> This looks like a port of the old amd64 code. If mercurial can be convinced
that
> it's a copy without too much effort, that would be nice, but not important.

Done. (Thought I'd already done it but maybe Mercurial forgot somehow.)
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b