Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/asm: No way to use the dup instruction in ARM64 assembly #65310

Closed
kovidgoyal opened this issue Jan 26, 2024 · 5 comments
Closed

cmd/asm: No way to use the dup instruction in ARM64 assembly #65310

kovidgoyal opened this issue Jan 26, 2024 · 5 comments
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime.

Comments

@kovidgoyal
Copy link

Go version

go version go1.21.6 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/kovid/.cache/go-build'
GOENV='/home/kovid/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/kovid/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/kovid/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.6'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/kovid/work/simdstring/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3570585833=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Tried to find a way to use the the dup arm64 assembly instruction via go assembly. It doesnt seem to exist. The closest I can find is VDUP but that seems to take something known a C_ELEM not sure what that is, if its intended to map to dup it should take general purpose registers as operands. Here is a simple function to set all bytes in a register:

TEXT ·test_set1_epi8_asm_128(SB), NOSPLIT, $0-16
	MOVBU b+0(FP), R0
	VDUP R0, V0
	MOVD ans+8(FP), R0
	VST1 [V0.B8], (R0)
	RET

Try to build it with go gives the error
asm: illegal combination: 00004 VDUP R0, V0 REG NONE NONE VREG

Grepping for VDUP in the go assembler package gives:

502:	{AVDUP, C_ELEM, C_NONE, C_NONE, C_ARNG, 79, 4, 0, 0, 0},
503:	{AVDUP, C_ELEM, C_NONE, C_NONE, C_VREG, 80, 4, 0, 0, 0},
504:	{AVDUP, C_ZREG, C_NONE, C_NONE, C_ARNG, 82, 4, 0, 0, 0},

Showing that the assembler thinks it doesnt take regular registers as operands.

ARM64 reference: https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/DUP--general---Duplicate-general-purpose-register-to-vector-?lang=en

What did you see happen?

Got the error:

asm: illegal combination: 00004 	VDUP	R0, V0 REG NONE NONE VREG

What did you expect to see?

Expected it to generate instructions similar to

dup.8b	v0, w0

Maybe there is some other way to do this in Go assembly? If so would appreciate being enlightened.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 26, 2024
@mauri870
Copy link
Member

The obj docs for arm64 states:

Go adds a V prefix for most floating-point and SIMD instructions, except cryptographic extension instructions and floating-point(scalar) instructions. 

@cherrymui
Copy link
Member

It does support integer registers.

504: {AVDUP, C_ZREG, C_NONE, C_NONE, C_ARNG, 82, 4, 0, 0, 0},

C_ZREG is an integer registers or ZR, C_ARNG is an arrangement. So you need an arrangement as the destination, like VDUP R0, V1.B8.

Thanks.

@kovidgoyal
Copy link
Author

kovidgoyal commented Jan 31, 2024

You are correct, my mistake. However, there are a couple of other useful instructions that are missing:

  1. VCGT (> on vector registers) (sometimes called CMGT neither name is recognized by Go)
  2. SHRN (shift right and narrow)

@cherrymui
Copy link
Member

It is true that we don't yet support all vector instructions. You could write WORD directives for the bit encoding. Also feel free to send a CL to add the support. See also #44734 for a more general approach of adding new instructions. Thanks.

@kovidgoyal
Copy link
Author

Yes, I have already implemented it in my code using WORD, I am just
pointing out that these are rather useful/common instructions that are
missing. CMGT is useful when searching for bytes in strings in a range
and SHRN is useful to speed up finding matching positions in the output
of CMEQ and CMGT and friends. Indeed, the ARM64 implementations in the
stdlib bytealg could probably be simplified/sped up by using SHRN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime.
Projects
None yet
Development

No branches or pull requests

4 participants