cmd/compile: arm64: emit double-register loads and stores in lowering #19715
Labels
compiler/runtime
Issues related to the Go compiler and/or runtime.
NeedsFix
The path to resolution is known, but the work has not been done.
Performance
Milestone
ARMv8 has "double-register" loads and stores (
ldp
andstp
), which the present SSA backend does not emit.Since these instructions can accept any general-purpose register (unlike the corresponding ARMv7 instructions
ldrd
andstrd
), they shouldn't require any changes to the register allocator in order to get them working. I think things will "just work" ifldp
returns a tuple of the loaded values.In particular, I'd expect this change to make the sequence of instructions preceding and following a call op (with the existing ABI) to be reduced by up to a factor of 2. We can also use
stp
to zero 16 bytes at once.The
stp
rule should be expressible with the existing SSA rewrite rule infrastructure. Theldp
rule is more challenging, because there is no way to iterate the uses of a memory value in the rewrite rules. We may have to introduce an architecture-dependent pass just before lower to combine adjacent 4- and 8-byte loads into paired 8- and 16-byte loads.CC: @williamweixiao I'm happy to take this on, but please let me know if you have similar work planned.
The text was updated successfully, but these errors were encountered: