-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/asm: ARM64 instructions FLDPQ and FSTPQ not supported #40725
Comments
I found a bunch more issues with the assembler. Is it okay if I make a large issue just gathering all of them? |
Some more valid (I hope) instructions that don't encode:
There's probably a bunch more. Right now, almost every single SIMD instruction I want to use is broken in the assembler. This is not a sustainable state of affairs. |
@fuzxxl Thank you for the observation. |
@zhangfannie Best would be to have support for all SIMD and FP instructions, but fixing those I mentioned might be a good start. I have not written the code yet, so I can't tell for sure what exactly it is going to involve. If you like, I can try to implement the algorithm using the GNU assembler and then tell you what instructions I used. How does that sound? |
Please don't take this as a critisism, but as an observer of a number of this class of request, the best results are obtained when the OP, you in this case, can enumerate exactly which instructions to add. I have no explanation why requests for all XXX instructions are unsuccessful, but encourage you to list precisely the instructions you would like to see added as there is anecdotal evidence that requests formed in this way are resolved faster. |
Thanks, will do so. I'll proceed and prototype the algorithm in the GNU assembler so I can tell you which instructions exactly I'm going to need. |
So, I've now sketched out a prototype using the UNIX assembler. The following new instructions are required:
Especially important is the literal-pool |
@fuzxxl We will enable the above instructions ASAP. 🙂 |
Thanks! Very much appreciated. |
@fuzxxl The current go assembler have enabled FMOV (scalar, immediate) instruction, the go syntax is FMOVS and FMOVD. And according to ARM reference, there are no instructions like FMOVQ . |
The instruction I want is listed in the section “LDR (literal, SIMD&FP)” of the ARM reference:
here,
The assembler needs to place the immediate in a nearby literal pool and substitute its address. Of course, the assembler is free to use the “FMOV (scalar, immediate)” variant if the immediate fits. Similarly, it should translate these to The desired usage is to load a bit mask into a SIMD register:
Perhaps the mnemonics |
This prototype performs at 2.25 GB/s on a Raspberry Pi 3B and is written for the GNU assembler. I was unable to write it directly for the Go assembler as critical defects prevent me from using the desired instructions. Relevant issue in the Go bug tracker: golang/go#40725
Change https://golang.org/cl/249757 mentions this issue: |
Change https://golang.org/cl/249758 mentions this issue: |
Thanks for the changes. You made a typo in changeset 249758, file
|
I will modify it. 😅 Thank you. |
Due to some further optimisations in the code, I might also need the following instructions:
It would be great if you could add them, too. I hope this covers all of them. Thanks for all your great help! I'm feeling like we're getting somewhere with this. |
@fuzxxl You are welcome. We will enable the assembler's support for these instructions ASAP. 🙂 |
Thanks a lot for your hard work! |
Change https://golang.org/cl/251519 mentions this issue: |
Change https://golang.org/cl/253659 mentions this issue: |
Enable VBSL, VBIT, VCMTST, VUXTL VUXTL2 and FMOVQ SIMD instructions required by the issue #40725. And FMOVQ instrucion is used to move a large constant to a Vn register. Add test cases. Fixes #40725 Change-Id: I1cac1922a0a0165d698a4b73a41f7a5f0a0ad549 Reviewed-on: https://go-review.googlesource.com/c/go/+/249758 Reviewed-by: Cherry Zhang <cherryyz@google.com>
This CL adds USHLL, USHLL2, UZP1, UZP2, and BIF instructions requested by #40725. And since UXTL* are aliases of USHLL*, this CL also merges them into one case. Updates #40725 Change-Id: I404a4fdaf953319f72eea548175bec1097a2a816 Reviewed-on: https://go-review.googlesource.com/c/go/+/253659 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
@zhangfannie I've tried your new instructions, thanks a lot! One thing I noticed is that
fails to assemble with an “immediate out of range” message. And even if I choose a suitably sized immediate like
the assembler only generates a 64 bit literal in the literal pool despite this instruction reading 128 bit from the literal pool. I would be great to have this improved. |
@fuzxxl For the issue of "immediate out of range". This issue is reported by strconv.ParseUint(str, 0, 64) called by p.atoi() in the process of parsing immediate. The current go only supports bit sizes 0, 8, 16, 32 and 64, if bit size is below 0 or above 64, an error is returned (Please see https://golang.org/pkg/strconv/#ParseUint). If we want to parse a 128 bit immediate, we need to support 128 bit sizes. @cherrymui Can we do it? Thank you. |
Go doesn't have 128-bit integers (for now), so I don't think we want to do that. Instead, can we let it take two 64-bit operands, for the high and low parts separately? E.g. Or, we only support loading from memory, and let user write large constants to memory, something like
GNU assembler doesn't seem to support directly loading large constant either. It needs a label. We could use a global data symbol. |
@cherrymui The problem with a global symbol (as opposed to a literal pool) is that instructions are wasted generating the address of the global symbol. This incurs needless cost in high-performance code. Having two separate operands would be fine with me and I suppose it's a good alternative. Another idea would be to support a form that loads from a local label such that the programmer can manually create a literal pool. Any resolution on the inconsistency of using |
How many instructions are there like this, where it takes a constant that doesn't fit into 64-bit? Is this the only case or there are still many? I'm less sure about supporting manual constant pool management, mainly for consistency (there have been several instructions that use the constant pool managed by the assembler). I agree that the line between FP instruction and vector instruction is blurry. Does this instruction have anything specific to FP? If not, maybe we should just use VMOVQ (or VMOV) and take a V register? |
@cherrymui I am unaware of any other instructions operating on 128 bit constants in a literal pool. I think having to write this one with two immediates is a workable choice. The instruction is coded as taking a FP-style register in the ARM reference manual, so |
@cherrymui @clausecker Thank you for the solution. I will post a patch to write 128 bit with two immediates. |
@cherrymui @clausecker I spent some time trying to find a SIMD-FP instruction to do the "Dn << 64 + Dm = Qd" operation we need. Unfortunately, I did not find such an instruction 😥. Do you have any ideas? Thank you. |
@zhangfannie I'm not sure what you mean. The idea was to have the literal-loading form of
where
Or perhaps I misunderstood something here? |
@clausecker Sorry, my understanding is wrong. The patch will be submitted soon. Thank you for the explanation. 😊 |
@cherrymui @zhangfannie While we're add it, I have difficulty understanding what the Go syntax for a load or store instruction like these is:
I'm sure they are implemented (at least I hope), but neither of these variants seem to work:
Have I missed anything? |
FMOVQ/VMOVQ seem the right syntax (MOVQ is not). They don't seem to be implemented yet, though. For now, FMOVQ only supports constant source operand, and VMOVQ does not exist. (We have VMOV, but that is currently used only for moves between V registers.) |
@cherrymui Being able to load/store SIMD and FP registers would be a useful addition I suppose. I can use |
I agree it is useful. It just needs to be implemented. |
Thanks! If you're adding this one, could you also add |
@clausecker We will do it. Thank you. |
@zhangfannie Super cool! Reading the manual further, I believe |
Change https://golang.org/cl/255900 mentions this issue: |
… on arm64 The CL 249758 added `FMOVQ $vcon, Vd` instruction and assembler used 128-bit simd literal-loading to load `$vcon` from pool into 128-bit vector register `Vd`. Because Go does not have 128-bit integers for now, the assembler will report an error of `immediate out of range` when assembleing `FMOVQ $0x123456789abcdef0123456789abcdef, V0` instruction. This patch lets 128-bit integers take two 64-bit operands, for the high and low parts separately and adds `VMOVQ $hi, $lo, Vd` instruction to move `$hi<<64+$lo' into 128-bit register `Vd`. In addition, this patch renames `FMOVQ/FMOVD/FMOVS` ops to 'VMOVQ/VMOVD/VMOVS' and uses them to move 128-bit, 64-bit and 32-bit constants into vector registers, respectively Update the go doc. Fixes #40725 Change-Id: Ia3c83bb6463f104d2bee960905053a97299e0a3a Reviewed-on: https://go-review.googlesource.com/c/go/+/255900 Trust: fannie zhang <Fannie.Zhang@arm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
@clausecker The patch is ready and is under internal review. I will submit it as soon as possible. Thank you. |
Change https://golang.org/cl/258397 mentions this issue: |
This patch enables VSLI, VUADDW(2), VUSRA and FMOVQ SIMD instructions required by the issue #40725. And the GNU syntax of 'FMOVQ' is 128-bit ldr/str(immediate, simd&fp). Add test cases. Fixes #40725 Change-Id: Ide968ef4a9385ce4cd8f69bce854289014d30456 Reviewed-on: https://go-review.googlesource.com/c/go/+/258397 Trust: fannie zhang <Fannie.Zhang@arm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
With 3089ef6, the Go project finally supports VUSRA, FMOVQ, VUADDW, and VUADDW2. Make use of them. Unfortunately, not all addressing modes seem to be supported for FMOVQ and support for VUADDL and VUADDL2 has not yet landed. This will be added in a future commit. See also golang/go#40725
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Assemble the following ARM64 assembly file:
What did you expect to see?
An object file with instructions that disassemble to the GNU-style instructions
What did you see instead?
It appears that only
FLDPS
andFLDPD
are supported. Going through the instruction lists, it appears that no FP instructions with 128 bit instruction size seem to be supported at all. It would be great to have them added as I need them for a project.The text was updated successfully, but these errors were encountered: