cmd/asm: ARM64 NEON floating point instructions (VFABD VFMAX, VFMAXNM, VFMINNM VFMIN VFADD, VFSUB VFMUL, VFDIV VFMLA, VFMLS VCVT*) #41092

paultag · 2020-08-28T12:49:55Z

What version of Go are you using (`go version`)?

$ go version
go version go1.15 linux/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/home/ubuntu/.cache/go-build"
GOENV="/home/ubuntu/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/ubuntu/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/ubuntu/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/ubuntu/xx/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/ubuntu/xx/go/pkg/tool/linux_arm64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build141213394=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Whilst writing some NEON code, I found myself in need of floating point operations in NEON. I was able to load my data to the V* registers (and write it out!), but when I attempted to use VF* instructions, such as VFADD or VFMUL, those opcodes have not been implemented by any intrepid engineer on arm64.

What did you expect to see?

Vectorized floating point addition or multiplication.

What did you see instead?

unrecognized instruction "VFADD"

Test code

neon.go

package fptest
func AddFloat([]float32, []float32, []float32)

neon_test.go

package fptest_test

import (
        "testing"
        fptest "."

        "github.com/stretchr/testify/assert"
)

func TestAddFloat(t *testing.T) {
        dst := make([]float32, 4)
        fptest.AddFloat([]float32{1, 2, 3, 4}, []float32{10, 20, 30, 40}, dst)
        assert.Equal(t, []float32{11, 22, 33, 44}, dst)
}

neon_arm64.s

// func AddFloat(a []int32, b []int32, dst []int32)
TEXT ·AddFloat(SB), $0-72
    // For the sake of simplicity, this only does the first 4.

    // Load a, b and dst's addresses to R8, 9, 10.
    MOVD a+0(FP),    R8
    MOVD b+24(FP),   R9
    MOVD dst+48(FP), R10

    // Load [4]int32 from a, b to v1, v2.
    VLD1 (R8), [V1.S4]
    VLD1 (R9), [V2.S4]

    VFADD V1.S4, V2.S4, V1.S4
    // WORD $0x4e21d441;

    // Write [4]int32 to dst.
    VST1 [V1.S4], (R10)

    RET

The text was updated successfully, but these errors were encountered:

davecheney · 2020-08-28T16:49:08Z

Please see #40725

paultag · 2020-08-28T16:53:44Z

(I think the above comment is a reference to the following comment from that thread:)

Please don't take this as a critisism, but as an observer of a number of this class of request, the best results are obtained when the OP, you in this case, can enumerate exactly which instructions to add. I have no explanation why requests for all XXX instructions are unsuccessful, but encourage you to list precisely the instructions you would like to see added as there is anecdotal evidence that requests formed in this way are resolved faster.

I'll go shopping for opcodes, thanks @davecheney. I was going to try to put together a changeset to go with this issue, but figured I'd file it ahead of that in case it wound up being an insurmountable pile of internals changes. I'll list opcodes I'm in need of here, and see if I can produce a changeset (famous last bugreport words)

paultag · 2020-08-28T18:48:47Z

After looking at a few similar changes adding arm64 opcodes, I suspect I'm in over my head. I'm still going to try my hand at a changeset for sheer sport, but I wouldn't block on me if anyone competent comes across this issue.

These are the most burning opcodes to help remove some bottlenecks I've hit:

VFABD
VFMAX, VFMAXNM, VFMINNM VFMIN
VFADD, VFSUB
VFMUL, VFDIV
VFMLA, VFMLS
VCVT (Not sure what the correct instruction opcode(s) in idiomatic Go ASM would be here are -- the ones that i'd want would be "float to unsigned integer", "float to signed integer", "signed integer to float" and "unsigned integer to float" variants)

cagedmantis · 2020-08-31T18:37:44Z

/cc @cherrymui

Clement-Jean · 2023-12-20T08:04:23Z

@cherrymui Is there any way I can help on this? I'm also interested in this. Just don't really know where to start.

Clement-Jean · 2023-12-20T08:53:46Z

For those reading this and not being able to wait for the implementation, you can do the following:

write the intrinsic in C/C++ or whatever language let you do it. e.g for VFADD 32x4:

float32x4_t t;

t = vaddq_f32(t, t);

compile and disassemble the binary with otool or objdump. e.g for otool:

$ gcc main.c
$ otool -tvj a.out
...
...        4e20d420        fadd.4s v0, v1, v0
...

the -j prints opcode bytes

take the opcode and load it with the WORD instruction into your assembly code:

WORD $0x4e20d420 // fadd.4s v0, v1, v0

⚠️ be aware of operand order !

paultag · 2023-12-20T14:19:38Z

I totally forgot to post similar after I did my work - I eventually published the code I was working on, here's a link for future travelers (although I wound up hacking up a simple assembler in Python using the arm64 opcode PDF, the code got lost a few machines ago but it wasn't particularly smart, this compiler method is a better idea) -- thanks for the post and reminder @Clement-Jean

https://github.com/hztools/go-sdr/blob/9809d5729f372dde16038710b13f70ff484baaf8/internal/simd/mult_simd_arm64.s#L89

cherrymui · 2023-12-20T16:06:15Z

@Clement-Jean thank you for being interested in contributing! The source code of the ARM64 assembler backend is at https://cs.opensource.google/go/go/+/master:src/cmd/internal/obj/arm64/ . It is not hard to add support of a new instruction, given that we already support similar instructions. Thanks.

paultag changed the title ~~ARM64 NEON floating point instructions (VFADD, VFMUL, etc)~~ cmd/asm: ARM64 NEON floating point instructions (VFADD, VFMUL, etc) Aug 28, 2020

paultag changed the title ~~cmd/asm: ARM64 NEON floating point instructions (VFADD, VFMUL, etc)~~ cmd/asm: ARM64 NEON floating point instructions (VFABD VFMAX, VFMAXNM, VFMINNM VFMIN VFADD, VFSUB VFMUL, VFDIV VFMLA, VFMLS VCVT*)) Aug 28, 2020

cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 31, 2020

cagedmantis added this to the Backlog milestone Aug 31, 2020

jacksonrnewhouse mentioned this issue Nov 1, 2020

cmd/asm: ARM64 NEON unsigned min and max instructions (VUMAX, VUMIN) #42326

Closed

erifan mentioned this issue Mar 2, 2021

cmd/asm: refactor the framework of the arm64 assembler #44734

Open

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

cherrymui added the help wanted label Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/asm: ARM64 NEON floating point instructions (VFABD VFMAX, VFMAXNM, VFMINNM VFMIN VFADD, VFSUB VFMUL, VFDIV VFMLA, VFMLS VCVT*) #41092

cmd/asm: ARM64 NEON floating point instructions (VFABD VFMAX, VFMAXNM, VFMINNM VFMIN VFADD, VFSUB VFMUL, VFDIV VFMLA, VFMLS VCVT*) #41092

paultag commented Aug 28, 2020 •

edited

davecheney commented Aug 28, 2020

paultag commented Aug 28, 2020

paultag commented Aug 28, 2020 •

edited

cagedmantis commented Aug 31, 2020

Clement-Jean commented Dec 20, 2023

Clement-Jean commented Dec 20, 2023 •

edited

paultag commented Dec 20, 2023 •

edited

cherrymui commented Dec 20, 2023

cmd/asm: ARM64 NEON floating point instructions (VFABD VFMAX, VFMAXNM, VFMINNM VFMIN VFADD, VFSUB VFMUL, VFDIV VFMLA, VFMLS VCVT*) #41092

cmd/asm: ARM64 NEON floating point instructions (VFABD VFMAX, VFMAXNM, VFMINNM VFMIN VFADD, VFSUB VFMUL, VFDIV VFMLA, VFMLS VCVT*) #41092

Comments

paultag commented Aug 28, 2020 • edited

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Test code

neon.go

neon_test.go

neon_arm64.s

davecheney commented Aug 28, 2020

paultag commented Aug 28, 2020

paultag commented Aug 28, 2020 • edited

cagedmantis commented Aug 31, 2020

Clement-Jean commented Dec 20, 2023

Clement-Jean commented Dec 20, 2023 • edited

paultag commented Dec 20, 2023 • edited

cherrymui commented Dec 20, 2023

paultag commented Aug 28, 2020 •

edited

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

paultag commented Aug 28, 2020 •

edited

Clement-Jean commented Dec 20, 2023 •

edited

paultag commented Dec 20, 2023 •

edited