Skip to content

runtime: add asm version of cmpstring and bytes·Compare for ppc64 #10007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
josharian opened this issue Feb 26, 2015 · 4 comments
Closed

runtime: add asm version of cmpstring and bytes·Compare for ppc64 #10007

josharian opened this issue Feb 26, 2015 · 4 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@josharian
Copy link
Contributor

Currently only amd64 and 386 have asm versions of cmpstring and bytes·Compare. These can be important for performance (see e.g. #10000). The compiler generates ok but not amazing code for these. Here's the ARM output for cmpstring:

"".cmpstring t=1 size=268 value=0 args=0x14 locals=0x0
    0x0000 00000 (src/runtime/noasm.go:13)  TEXT    "".cmpstring+0(SB),0,$0-20
    0x0000 00000 (src/runtime/noasm.go:13)  MOVW    8(R10),R1
    0x0004 00004 (src/runtime/noasm.go:13)  CMP R1,R13,
    0x0008 00008 (src/runtime/noasm.go:13)  MOVW.LS R14,R3
    0x000c 00012 (src/runtime/noasm.go:13)  CALL.LS ,runtime.morestack_noctxt(SB)
    0x0010 00016 (src/runtime/noasm.go:13)  BLS ,0
    0x0014 00020 (src/runtime/noasm.go:13)  MOVW.W  R14,-4(R13)
    0x0018 00024 (src/runtime/noasm.go:13)  FUNCDATA    $0,gclocals·f271231f400e778e0f59be25f7a26a56+0(SB)
    0x0018 00024 (src/runtime/noasm.go:13)  FUNCDATA    $1,gclocals·3280bececceccd33cb74587feedb1f9f+0(SB)
    0x0018 00024 (src/runtime/noasm.go:13)  MOVW    $0,R0
    0x001c 00028 (src/runtime/noasm.go:14)  MOVW    "".s1+4(FP),R7
    0x0020 00032 (src/runtime/noasm.go:15)  MOVW    "".s2+12(FP),R3
    0x0024 00036 (src/runtime/noasm.go:15)  CMP R7,R3,
    0x0028 00040 (src/runtime/noasm.go:15)  BGE ,264
    0x002c 00044 (src/runtime/noasm.go:16)  MOVW    "".s2+12(FP),R7
    0x0030 00048 (src/runtime/noasm.go:18)  MOVW    $0,R3
    0x0034 00052 (src/runtime/noasm.go:18)  MOVW    R3,R4
    0x0038 00056 (src/runtime/noasm.go:18)  CMP R7,R3,
    0x003c 00060 (src/runtime/noasm.go:18)  BGE $0,196
    0x0040 00064 (src/runtime/noasm.go:19)  MOVW    $"".s2+8(FP),R0
    0x0044 00068 (src/runtime/noasm.go:19)  MOVW    R3,R1
    0x0048 00072 (src/runtime/noasm.go:19)  MOVW    4(R0),R2
    0x004c 00076 (src/runtime/noasm.go:19)  CMP R2,R3,
    0x0050 00080 (src/runtime/noasm.go:19)  BLO $1,92
    0x0054 00084 (src/runtime/noasm.go:19)  PCDATA  $0,$0
    0x0054 00084 (src/runtime/noasm.go:19)  CALL    ,runtime.panicindex(SB)
    0x0058 00088 (src/runtime/noasm.go:19)  UNDEF   ,
    0x005c 00092 (src/runtime/noasm.go:19)  MOVW    0(R0),R0
    0x0060 00096 (src/runtime/noasm.go:19)  MOVBU   R3<<0(R0),R5
    0x0064 00100 (src/runtime/noasm.go:19)  MOVW    $"".s1+0(FP),R0
    0x0068 00104 (src/runtime/noasm.go:19)  MOVW    R3,R1
    0x006c 00108 (src/runtime/noasm.go:19)  MOVW    4(R0),R2
    0x0070 00112 (src/runtime/noasm.go:19)  CMP R2,R3,
    0x0074 00116 (src/runtime/noasm.go:19)  BLO $1,128
    0x0078 00120 (src/runtime/noasm.go:19)  PCDATA  $0,$0
    0x0078 00120 (src/runtime/noasm.go:19)  CALL    ,runtime.panicindex(SB)
    0x007c 00124 (src/runtime/noasm.go:19)  UNDEF   ,
    0x0080 00128 (src/runtime/noasm.go:19)  MOVW    0(R0),R0
    0x0084 00132 (src/runtime/noasm.go:19)  MOVBU   R3<<0(R0),R6
    0x0088 00136 (src/runtime/noasm.go:19)  MOVBU   R5,R5
    0x008c 00140 (src/runtime/noasm.go:20)  CMP R5,R6,
    0x0090 00144 (src/runtime/noasm.go:20)  BHS ,160
    0x0094 00148 (src/runtime/noasm.go:21)  MOVW    $-1,R0
    0x0098 00152 (src/runtime/noasm.go:21)  MOVW    R0,"".~r2+16(FP)
    0x009c 00156 (src/runtime/noasm.go:21)  MOVW.P  4(R13),R15
    0x00a0 00160 (src/runtime/noasm.go:23)  MOVBU   R6,R4
    0x00a4 00164 (src/runtime/noasm.go:23)  MOVBU   R5,R2
    0x00a8 00168 (src/runtime/noasm.go:23)  CMP R2,R4,
    0x00ac 00172 (src/runtime/noasm.go:23)  BLS ,188
    0x00b0 00176 (src/runtime/noasm.go:24)  MOVW    $1,R0
    0x00b4 00180 (src/runtime/noasm.go:24)  MOVW    R0,"".~r2+16(FP)
    0x00b8 00184 (src/runtime/noasm.go:24)  MOVW.P  4(R13),R15
    0x00bc 00188 (src/runtime/noasm.go:18)  ADD $1,R3,R3
    0x00c0 00192 (src/runtime/noasm.go:18)  JMP ,52
    0x00c4 00196 (src/runtime/noasm.go:27)  MOVW    "".s1+4(FP),R3
    0x00c8 00200 (src/runtime/noasm.go:27)  MOVW    "".s2+12(FP),R4
    0x00cc 00204 (src/runtime/noasm.go:27)  CMP R4,R3,
    0x00d0 00208 (src/runtime/noasm.go:27)  BGE ,224
    0x00d4 00212 (src/runtime/noasm.go:28)  MOVW    $-1,R0
    0x00d8 00216 (src/runtime/noasm.go:28)  MOVW    R0,"".~r2+16(FP)
    0x00dc 00220 (src/runtime/noasm.go:28)  MOVW.P  4(R13),R15
    0x00e0 00224 (src/runtime/noasm.go:30)  MOVW    "".s1+4(FP),R2
    0x00e4 00228 (src/runtime/noasm.go:30)  MOVW    "".s2+12(FP),R3
    0x00e8 00232 (src/runtime/noasm.go:30)  CMP R3,R2,
    0x00ec 00236 (src/runtime/noasm.go:30)  BLE ,252
    0x00f0 00240 (src/runtime/noasm.go:31)  MOVW    $1,R0
    0x00f4 00244 (src/runtime/noasm.go:31)  MOVW    R0,"".~r2+16(FP)
    0x00f8 00248 (src/runtime/noasm.go:31)  MOVW.P  4(R13),R15
    0x00fc 00252 (src/runtime/noasm.go:33)  MOVW    $0,R0
    0x0100 00256 (src/runtime/noasm.go:33)  MOVW    R0,"".~r2+16(FP)
    0x0104 00260 (src/runtime/noasm.go:33)  MOVW.P  4(R13),R15
    0x0108 00264 (src/runtime/noasm.go:18)  JMP ,48

At first glance, there are extraneous panicindex calls, and we could be doing 32 bit comparisons instead of 8 bit comparisons as we loop. And I'm sure it could be optimized further.

cc @davecheney

@josharian
Copy link
Contributor Author

Here's a first pass at an arm version for cmpstring. The same code works for bytes.Compare, so we just need a bit of glue. It doesn't do 32 bits at a time, but it's still a 3x improvement. I'm putting it up here because I'm not sure when I'll finish it, and maybe someone else will run with it in the meantime.

#include "textflag.h"

TEXT runtime·cmpstring(SB),NOSPLIT,$-4-20
    MOVW    s1_base+0(FP), R2
    MOVW    s1_len+4(FP), R0
    MOVW    s2_base+8(FP), R3
    MOVW    s2_len+12(FP), R1
    CMP R0, R1
    MOVW.LT R0, R1 // R1 is min length

    ADD R2, R1, R6 // R2 is current byte in s1, R6 is last byte in s1 to compare
loop:
    CMP R2, R6
    BEQ samebytes // all compared bytes were the same; compare lengths
    MOVBU.P 1(R2), R4
    MOVBU.P 1(R3), R5
    CMP R4, R5
    BEQ loop
    // bytes differed
    MOVW.LT $1, R8
    MOVW.GT $-1, R8
    MOVW    R8, ret+16(FP)
    RET
samebytes:
    CMP R0, R1
    MOVW.LT $1, R8
    MOVW.GT $-1, R8
    MOVW.EQ $0, R8
    MOVW    R8, ret+16(FP)
    RET

davecheney added a commit that referenced this issue Mar 25, 2015
Update #10007

Implement runtime.cmpstring and bytes.Compare in asm for arm.

benchmark                                old ns/op     new ns/op     delta
BenchmarkCompareBytesEqual               254           91.4          -64.02%
BenchmarkCompareBytesToNil               41.5          37.6          -9.40%
BenchmarkCompareBytesEmpty               40.7          37.6          -7.62%
BenchmarkCompareBytesIdentical           255           96.3          -62.24%
BenchmarkCompareBytesSameLength          125           60.9          -51.28%
BenchmarkCompareBytesDifferentLength     133           60.9          -54.21%
BenchmarkCompareBytesBigUnaligned        17985879      5669706       -68.48%
BenchmarkCompareBytesBig                 17097634      4926798       -71.18%
BenchmarkCompareBytesBigIdentical        16861941      4389206       -73.97%

benchmark                             old MB/s     new MB/s     speedup
BenchmarkCompareBytesBigUnaligned     58.30        184.95       3.17x
BenchmarkCompareBytesBig              61.33        212.83       3.47x
BenchmarkCompareBytesBigIdentical     62.19        238.90       3.84x

This is a collaboration between Josh Bleecher Snyder and myself.

Change-Id: Ib3944b8c410d0e12135c2ba9459bfe131df48edd
Reviewed-on: https://go-review.googlesource.com/8010
Reviewed-by: Keith Randall <khr@golang.org>
@rsc
Copy link
Contributor

rsc commented Apr 10, 2015

Reducing the scope of this bug to be just about arm.

We might want to do something for arm in Go 1.5, since @mwhudson is putting string compares into interface checks, although it's only on the slow path.

arm64 and ppc64 are not going to be completed for Go 1.5 and are missing tons of other assembly anyway.

@rsc rsc changed the title runtime: add asm version of cmpstring and bytes·Compare for arm and ppc runtime: add asm version of cmpstring and bytes·Compare Apr 10, 2015
@rsc rsc added this to the Go1.5Maybe milestone Apr 10, 2015
@davecheney
Copy link
Contributor

Arm and arm64 is done, it's really just ppc64 left

On 10 Apr 2015, at 06:56, Russ Cox notifications@github.com wrote:

Reducing the scope of this bug to be just about arm.

We might want to do something for arm in Go 1.5, since @mwhudson is putting string compares into interface checks, although it's only on the slow path.

arm64 and ppc64 are not going to be completed for Go 1.5 and are missing tons of other assembly anyway.


Reply to this email directly or view it on GitHub.

@rsc rsc changed the title runtime: add asm version of cmpstring and bytes·Compare runtime: add asm version of cmpstring and bytes·Compare for ppc64 Jul 21, 2015
@rsc rsc modified the milestones: Go1.6, Go1.5Maybe Jul 21, 2015
@rsc rsc modified the milestones: Unplanned, Go1.6 Dec 5, 2015
@rsc rsc added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 26, 2016
@mundaym
Copy link
Member

mundaym commented Nov 7, 2016

Looks like this issue has been fixed on ppc64{,le}: 32d3b96 (and 1e28dce).

@bradfitz bradfitz closed this as completed Nov 7, 2016
@golang golang locked and limited conversation to collaborators Nov 7, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants