Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: intrinsify math/bits.RotateLeft{32,64} on all architectures #31265

Open
mundaym opened this issue Apr 4, 2019 · 9 comments
Open
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@mundaym
Copy link
Member

mundaym commented Apr 4, 2019

#21536 (comment) reminded me about this. Currently the intrinsic is only implemented on amd64, arm64, ppc64 and s390x. We should enable it everywhere. It will need tests in tests/codegen. Could be a good beginner compiler issue to tackle.

CLs so far:

@josharian
Copy link
Contributor

cc @benshi001 for arm

@bcmills bcmills added the NeedsFix The path to resolution is known, but the work has not been done. label Apr 12, 2019
@odeke-em
Copy link
Member

Kindly paging @bmkessler who might also be interested in this issue.

@gopherbot
Copy link

Change https://golang.org/cl/182359 mentions this issue: cmd/compile: intrinsify RotateLeft32 on wasm

@gopherbot
Copy link

Change https://golang.org/cl/188697 mentions this issue: cmd/compile: optimize ARM's math.bits.RotateLeft32

@gopherbot
Copy link

Change https://golang.org/cl/189277 mentions this issue: cmd/compile: optimize 386's math.bits.RotateLeft32

gopherbot pushed a commit that referenced this issue Aug 28, 2019
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.

The benchmark results of math/bits show some improvements.
name               old time/op  new time/op  delta
RotateLeft-4       9.42ns ± 0%  6.91ns ± 0%  -26.66%  (p=0.000 n=40+33)
RotateLeft8-4      8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+31)
RotateLeft16-4     8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+32)
RotateLeft32-4     8.16ns ± 0%  7.54ns ± 0%   -7.68%  (p=0.000 n=40+40)
RotateLeft64-4     15.7ns ± 0%  15.7ns ± 0%     ~     (all equal)

updates #31265

Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
gopherbot pushed a commit that referenced this issue Aug 31, 2019
wasm has 32-bit versions of all integer operations. This change
lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits
call.  Benchmarking on amd64 under node.js this is ~25% faster.

node v10.15.3/amd64
name          old time/op  new time/op  delta
RotateLeft    8.37ns ± 1%  8.28ns ± 0%   -1.05%  (p=0.029 n=4+4)
RotateLeft8   11.9ns ± 1%  11.8ns ± 0%     ~     (p=0.167 n=5+5)
RotateLeft16  11.8ns ± 0%  11.8ns ± 0%     ~     (all equal)
RotateLeft32  11.9ns ± 1%   8.7ns ± 0%  -26.32%  (p=0.008 n=5+5)
RotateLeft64  8.31ns ± 1%  8.43ns ± 2%     ~     (p=0.063 n=5+5)

Updates #31265

Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46
Reviewed-on: https://go-review.googlesource.com/c/go/+/182359
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
tomocy pushed a commit to tomocy/go that referenced this issue Sep 1, 2019
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.

The benchmark results of math/bits show some improvements.
name               old time/op  new time/op  delta
RotateLeft-4       9.42ns ± 0%  6.91ns ± 0%  -26.66%  (p=0.000 n=40+33)
RotateLeft8-4      8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+31)
RotateLeft16-4     8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+32)
RotateLeft32-4     8.16ns ± 0%  7.54ns ± 0%   -7.68%  (p=0.000 n=40+40)
RotateLeft64-4     15.7ns ± 0%  15.7ns ± 0%     ~     (all equal)

updates golang#31265

Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
tomocy pushed a commit to tomocy/go that referenced this issue Sep 1, 2019
wasm has 32-bit versions of all integer operations. This change
lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits
call.  Benchmarking on amd64 under node.js this is ~25% faster.

node v10.15.3/amd64
name          old time/op  new time/op  delta
RotateLeft    8.37ns ± 1%  8.28ns ± 0%   -1.05%  (p=0.029 n=4+4)
RotateLeft8   11.9ns ± 1%  11.8ns ± 0%     ~     (p=0.167 n=5+5)
RotateLeft16  11.8ns ± 0%  11.8ns ± 0%     ~     (all equal)
RotateLeft32  11.9ns ± 1%   8.7ns ± 0%  -26.32%  (p=0.008 n=5+5)
RotateLeft64  8.31ns ± 1%  8.43ns ± 2%     ~     (p=0.063 n=5+5)

Updates golang#31265

Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46
Reviewed-on: https://go-review.googlesource.com/c/go/+/182359
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
t4n6a1ka pushed a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.

The benchmark results of math/bits show some improvements.
name               old time/op  new time/op  delta
RotateLeft-4       9.42ns ± 0%  6.91ns ± 0%  -26.66%  (p=0.000 n=40+33)
RotateLeft8-4      8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+31)
RotateLeft16-4     8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+32)
RotateLeft32-4     8.16ns ± 0%  7.54ns ± 0%   -7.68%  (p=0.000 n=40+40)
RotateLeft64-4     15.7ns ± 0%  15.7ns ± 0%     ~     (all equal)

updates golang#31265

Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
t4n6a1ka pushed a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
wasm has 32-bit versions of all integer operations. This change
lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits
call.  Benchmarking on amd64 under node.js this is ~25% faster.

node v10.15.3/amd64
name          old time/op  new time/op  delta
RotateLeft    8.37ns ± 1%  8.28ns ± 0%   -1.05%  (p=0.029 n=4+4)
RotateLeft8   11.9ns ± 1%  11.8ns ± 0%     ~     (p=0.167 n=5+5)
RotateLeft16  11.8ns ± 0%  11.8ns ± 0%     ~     (all equal)
RotateLeft32  11.9ns ± 1%   8.7ns ± 0%  -26.32%  (p=0.008 n=5+5)
RotateLeft64  8.31ns ± 1%  8.43ns ± 2%     ~     (p=0.063 n=5+5)

Updates golang#31265

Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46
Reviewed-on: https://go-review.googlesource.com/c/go/+/182359
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
@odeke-em
Copy link
Member

odeke-em commented Oct 17, 2019

Alright, and doing a checkup here on the statuses of the various GOARCHes

GOARCH CLs Done
386 89277
amd64 132435
arm 188697
arm64 122542
mips
mips64
mips64le
mipsle
ppc64 163760
ppc64le 163760
s390x 133035
wasm 182359

From the table above, we are currently lacking for the GOARCH variants of mips* and ppc64le.
Kindly pinging you @ceseo, might you have some bandwidth to work on a CL for ppc64le? @cherrymui @randall77 might y'all be available for mips* or have suggestions of other experts? I think we can get this in for Go1.14 :) And thank you everyone who has submitted CLs and reviewed or discussed getting CLs to solve this, great stuff!

@cherrymui
Copy link
Member

CL 163760 should cover both ppc64 and ppc64le, right?

On MIPS, the rotate instruction is rather new. We still support older machines. I guess we could do something based on GOMIPS(64) value, but we need to introduce the values first. Two shifts + an OR doesn't sounds too bad.

@laboger
Copy link
Contributor

laboger commented Oct 17, 2019 via email

@mengzhuo
Copy link
Contributor

mengzhuo commented Oct 17, 2019

CL 163760 should cover both ppc64 and ppc64le, right?

On MIPS, the rotate instruction is rather new. We still support older machines. I guess we could do something based on GOMIPS(64) value, but we need to introduce the values first. Two shifts + an OR doesn't sounds too bad.

The minimal requirement of MIPS64X is MIPS III, which is introduced in 1991 according to Wikipedia.

I'm thinking update Go minimal requirement of MIPS64X to MIPS64 R2 (not MIPS II) as MIPS32X do.

And we can adding compile flag

GOMIPS64=r*

r2 = release 2
r6 = release 6

For the future MIPS64 r6, which is whole new instruction set.

We already had GOMIPS64=hardfloat and won't conflict with r* flag by separate them with comma, i.e. GOMIPS64=hardfloat,r2

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

8 participants