Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes: Compare should use runtime.cmpstring or assembly #5354

Closed
bradfitz opened this issue Apr 26, 2013 · 6 comments
Closed

bytes: Compare should use runtime.cmpstring or assembly #5354

bradfitz opened this issue Apr 26, 2013 · 6 comments

Comments

@bradfitz
Copy link
Contributor

bytes.Compare is implemented in Go.

We should either make the compiler generate better code for it, or have it implemented
in terms of runtime.cmpstring (from runtime/string.goc).  Or assembly.

On one application I'm optimizing now, 10% of CPU time is spent within bytes.Compare
(and rising, as other parts improve).

Related: issue #3751 (for strings.Index, bytes.IndexByte, etc)
@julienschmidt
Copy link
Contributor

Comment 1:

I have no idea why, but I was in the mood to try implementing this in Assembly (amd64).
I'm pretty sure it's far from perfect since I have only written very few Intel 80386
assembly before.
It's more or less a direct port of the Go code. Using 64-bit MMX compare or SSE 4.2
instructions [1] should result in much better performance.
If someone has a use for it, do what ever you like with it. The Benchmarks (ported from
string compare) are also attached.
1: http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
Performance:
OLD
BenchmarkCompareBytesEqual             50000000               40.8 ns/op
BenchmarkCompareBytesIdentical         50000000               40.1 ns/op
BenchmarkCompareBytesSameLength       100000000               17.4 ns/op
BenchmarkCompareBytesDifferentLength  100000000               17.3 ns/op
BenchmarkCompareBytesBigUnaligned          2000            1255571 ns/op    835.15 MB/s
BenchmarkCompareBytesBig                   2000            1244571 ns/op    842.53 MB/s
NEW
BenchmarkCompareBytesEqual            100000000               17.5 ns/op
BenchmarkCompareBytesIdentical        500000000               4.13 ns/op
BenchmarkCompareBytesSameLength       200000000               9.52 ns/op
BenchmarkCompareBytesDifferentLength  200000000               8.85 ns/op
BenchmarkCompareBytesBigUnaligned          2000             927053 ns/op   1131.10 MB/s
BenchmarkCompareBytesBig                   2000             927553 ns/op   1130.49 MB/s

Attachments:

  1. compare_amd64.s (1469 bytes)
  2. compare_test.go (2413 bytes)

@julienschmidt
Copy link
Contributor

Comment 2:

Another try. It has also a bug fixed where the result was 0 if the array pointer was
identical but not the length.
OLD
BenchmarkCompareBytesEqual               50000000                40.3 ns/op
BenchmarkCompareBytesToNil              500000000                7.65 ns/op
BenchmarkCompareBytesEmpty              200000000                8.23 ns/op
BenchmarkCompareBytesIdentical           50000000                39.9 ns/op
BenchmarkCompareBytesSameLength         100000000                17.4 ns/op
BenchmarkCompareBytesDifferentLength    100000000                17.0 ns/op
BenchmarkCompareBytesBigUnaligned            2000             1238070 ns/op    846.95
MB/s
BenchmarkCompareBytesBig                     2000             1241571 ns/op    844.56
MB/s
BenchmarkCompareBytesBigIdentical            2000             1232570 ns/op
NEW
BenchmarkCompareBytesEqual              100000000                16.4 ns/op
BenchmarkCompareBytesToNil              500000000                4.99 ns/op
BenchmarkCompareBytesEmpty              500000000                4.69 ns/op
BenchmarkCompareBytesIdentical          500000000                4.69 ns/op
BenchmarkCompareBytesSameLength         200000000                8.54 ns/op
BenchmarkCompareBytesDifferentLength    200000000                9.11 ns/op
BenchmarkCompareBytesBigUnaligned            5000              655437 ns/op   1599.83
MB/s
BenchmarkCompareBytesBig                     5000              642836 ns/op   1631.19
MB/s
BenchmarkCompareBytesBigIdentical       500000000                4.69 ns/op
And BenchmarkCompareBytesBig for different lengths:
OLD
len = 1       50000000        39.8 ns/op       351.39 MB/s
len = 2       50000000        39.8 ns/op       352.09 MB/s
len = 4       50000000        39.7 ns/op       352.45 MB/s
len = 8       50000000        39.8 ns/op       351.92 MB/s
len = 16      50000000        57.5 ns/op       487.27 MB/s
len = 32      50000000        74.0 ns/op       567.69 MB/s
len = 64      20000000         106 ns/op       658.48 MB/s
len = 128     10000000         188 ns/op       743.45 MB/s
len = 256      5000000         335 ns/op       792.09 MB/s
len = 512      5000000         632 ns/op       819.31 MB/s
len = 1024     1000000        1241 ns/op       834.76 MB/s
len = 2048     1000000        2435 ns/op       845.13 MB/s
len = 4096      500000        4840 ns/op       847.47 MB/s
len = 8192      200000        9660 ns/op       849.23 MB/s
len = 16384     100000       19221 ns/op       852.92 MB/s
len = 32768      50000       38482 ns/op       851.67 MB/s
len = 65536      20000       76704 ns/op       854.55 MB/s
len = 131072     10000      153808 ns/op       852.24 MB/s
len = 262144      5000      307017 ns/op       853.86 MB/s
len = 524288      5000      615835 ns/op       851.36 MB/s
len = 1048576     2000     1231070 ns/op       851.77 MB/s
NEW
len = 1      100000000        16.5 ns/op       846.38 MB/s
len = 2      100000000        16.4 ns/op       852.57 MB/s
len = 4      100000000        16.4 ns/op       852.57 MB/s
len = 8      100000000        16.5 ns/op       848.44 MB/s
len = 16     100000000        28.7 ns/op       975.89 MB/s
len = 32      50000000        46.9 ns/op       895.85 MB/s
len = 64      50000000        63.5 ns/op      1102.99 MB/s
len = 128     20000000         105 ns/op      1330.09 MB/s
len = 256     10000000         179 ns/op      1481.81 MB/s
len = 512      5000000         377 ns/op      1373.93 MB/s
len = 1024     5000000         731 ns/op      1415.61 MB/s
len = 2048     1000000        1394 ns/op      1476.24 MB/s
len = 4096     1000000        2594 ns/op      1581.25 MB/s
len = 8192      500000        4998 ns/op      1641.36 MB/s
len = 16384     200000       11115 ns/op      1474.86 MB/s
len = 32768     100000       19941 ns/op      1643.54 MB/s
len = 65536      50000       39982 ns/op      1639.43 MB/s
len = 131072     20000       82104 ns/op      1596.52 MB/s
len = 262144     10000      162009 ns/op      1618.12 MB/s
len = 524288      5000      321418 ns/op      1631.21 MB/s
len = 1048576     5000      640836 ns/op      1636.28 MB/s
Okay, I stop abusing this issue report now ;)
I hope this helps as a basis for a proper CL. I see no reason why the same function
couldn't be also used to improve string comparison.

Attachments:

  1. compare_amd64.s (2079 bytes)
  2. compare_test.go (3800 bytes)

@bradfitz
Copy link
Contributor Author

Comment 3:

Nice.
Yes, I agree that the same assembly should be used for both bytes.Compare and for
runtime.cmpstring.
Please send a CL once Go 1.1 comes out.

Status changed to Started.

@randall77
Copy link
Contributor

Comment 4:

Julien, I was in the mood also, so I hacked up a patch using SSE to do 16 bytes at a time
https://golang.org/cl/8853048
Similar to the memeq implementation I did.  It's pretty fast.  Still needs more testing
and 386/arm implementations.
As an aside, it would be nice to have an easy way of writing a single function in
assembly for one arch and c (or go) for another arch.  Right now it is an
all-arch-or-no-arch conversion from c to assembly.  I think you could hack it up with
+build tags but it would require a lot of files.  Maybe per-function +build pragmas?

@bradfitz
Copy link
Contributor Author

Comment 5:

Owner changed to @randall77.

@randall77
Copy link
Contributor

Comment 6:

This issue was closed by revision b3946dc.

Status changed to Fixed.

@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants