Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: memcombine should learn allignement of byte slices - easy mode - types #71778

Open
Jorropo opened this issue Feb 16, 2025 · 5 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Implementation Issues describing a semantics-preserving change to the Go implementation. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@Jorropo
Copy link
Member

Jorropo commented Feb 16, 2025

Go version

go version devel go1.25-b38415d7e9 Sat Feb 15 21:47:27 2025 -0800 linux/amd64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='0'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='riscv64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/tmp/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/hugo/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3509629757=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/dev/null'
GOMODCACHE='/home/hugo/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/hugo/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GORISCV64='rva20u64'
GOROOT='/home/hugo/k/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/hugo/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/home/hugo/k/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='devel go1.25-b38415d7e9 Sat Feb 15 21:47:27 2025 -0800'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

See this piece of code:

package a

import "encoding/binary"

type S struct {
        v uint64
        arr [8]byte
}

func f(s *S) uint64 {
        return binary.LittleEndian.Uint64(s.arr[:])
}

What did you see happen?

This code compiles to a a series of 8bits memory loads then shift them left and finally bitwise or them together.

What did you expect to see?

A single 64 bits memory load.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 16, 2025
@Jorropo
Copy link
Member Author

Jorropo commented Feb 16, 2025

This came up while I was reviewing some library I was considering using in my project and found this cursed (invalid) use of unsafe:
https://github.com/xtaci/kcp-go/blob/5c80bedd4bd984dd71fb8c8669d91397235aec90/crypt.go#L297-L319

To fix all real world usecases it would be nice to have something like _ structs.Allign8 zero sized type to save 8 bytes over _ uint64 but that an other unrelated proposal.

@gabyhelp gabyhelp added the Implementation Issues describing a semantics-preserving change to the Go implementation. label Feb 16, 2025
@magical
Copy link
Contributor

magical commented Feb 17, 2025

Your snippet compiles down to a single `MOVQ' on amd64 but not riscv64. It looks to me like the memcombine pass is not enabled on riscv64 for some reason.

I don't see why alignment would matter - the RISC-V spec explicitly allows unaligned loads.

@randall77
Copy link
Contributor

I don't see why alignment would matter - the RISC-V spec explicitly allows unaligned loads.

It does, but it allows them to be implemented by taking a fault and simulating them in the OS. That would be very slow.

@Jorropo
Copy link
Member Author

Jorropo commented Feb 18, 2025

The exact relevant lines of code are:

c.unalignedOK = true

// This optimization requires that the architecture has
// unaligned loads and unaligned stores.
if !f.Config.unalignedOK {
return
}

Which is not enabled for riscv64:
case "riscv64":
c.PtrSize = 8
c.RegSize = 8
c.lowerBlock = rewriteBlockRISCV64
c.lowerValue = rewriteValueRISCV64
c.lateLowerBlock = rewriteBlockRISCV64latelower
c.lateLowerValue = rewriteValueRISCV64latelower
c.registers = registersRISCV64[:]
c.gpRegMask = gpRegMaskRISCV64
c.fpRegMask = fpRegMaskRISCV64
c.intParamRegs = paramIntRegRISCV64
c.floatParamRegs = paramFloatRegRISCV64
c.FPReg = framepointerRegRISCV64
c.hasGReg = true

There is Zicclsm for GORISCV64=rva22u64 and later. which require the CPU to support unaligned loads and stores, however it usually is still extremely slow.

Since linux v6.11 we can use the hwprobe api to check for load and store support using RISCV_HWPROBE_KEY_MISALIGNED_PERF.
torvalds/linux@c42e2f0
Which can be SLOW, FAST and EMULATED, however there is no RISCV64 profile which yet require fast misaligned memory operations so we would need to extend GORISCV64 with ,fast-misaligned or something.


What I'm saying right now is that there are enough arm and riscv64 cores out there which will always be slow that people get around the compiler not being that smart by using unsafe in a liberal manner.
If memcombine could figure out some loads and stores are aligned it would merge them even with unalignedOK == false and help theses chips.

@mknyszek mknyszek added this to the Unplanned milestone Feb 18, 2025
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Implementation Issues describing a semantics-preserving change to the Go implementation. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

6 participants