-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/internal/obj/x86: AVX512 design #22779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Masking: ARM uses Zeroing: this is conceptually part of masking (at least in my view), so the zero modifier must be part of the masking syntax, not part of the opcode or elsewhere. If we go with Broadcast: I think having an opcode suffix Rounding. Again, an opcode suffix is probably the best, and it doesn't overlap with broadcasting anyway. |
@rasky, I believe that rounding can't be specified without
Maybe VADDPD.RN_SAE Z4, Z5, Z6 |
For the reference that is related to "VEX vs EVEX" question: $ ./xed -64 -e VADDPD xmm0 xmm1 xmm2
Request: VADDPD MODE:2, REG0:XMM0, REG1:XMM1, REG2:XMM2, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Encodable! C5F158C2
.byte 0xc5,0xf1,0x58,0xc2
// (basically) same operation, but with EVEX encoding.
$ ./xed -64 -e VADDPD xmm0 k0 xmm1 xmm2
Request: VADDPD MODE:2, REG0:XMM0, REG1:K0, REG2:XMM1, REG3:XMM2, SMODE:2
OPERAND ORDER: REG0 REG1 REG2 REG3
Encodable! 62F1F50858C2
.byte 0x62,0xf1,0xf5,0x08,0x58,0xc2
// As a consequence, attempt to use High-16 registers
// without k0 will result in error.
$ ./xed -64 -e VADDPD xmm17 xmm18 xmm19
Request: VADDPD MODE:2, REG0:XMM17, REG1:XMM18, REG2:XMM19, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Could not encode: VADDPD xmm17 xmm18 xmm19
Error code was: GENERAL_ERROR
[XED CLIENT ERROR] Dying Same trick is applicable for x86 asm.
Instead of:
This simple change makes it possible for programmer to explicitly choose encoding to use. Not sure if it addresses this:
CC @TocarIP |
@quasilyte I noticed the same behaviour in xed regarding the EVEX instructions (and explicit need of mentioning k register for registers >= 16). Sticking to the same behaviour nicely addresses the issue (and even allow mixed mode routines which is a benefit). |
golang-dev first message, 5th question (@TocarIP):
List of such instructions that is based on x86.csv v0.2:
Planning to use X/Y/Z suffixes for now.
At least two options: use suffix only when can't encode proper instruction without it (a) or VCVTPD2PSX (%rax), %xmm0
VCVTPD2PSY (%rax), %xmm0
VCVTPD2PS (%rax), %ymm0
VFPCLASSPSX $1, (%rax), %k0{%k1}
VFPCLASSPSY $1, (%rax), %k0{%k1}
VFPCLASSPSZ $1, (%rax), %k0{%k1} With current quality of error messages, it could be very disappointing to see "error: invalid instruction" when "Z" suffix is either used or not. Maybe alias can help here, but it would be better to have more precise error messages (see #21860). |
Work in progress examples: // Embedded rounding.
VADDPD.RU_SAE Z3, Z2, K1, Z1 // 62f1ed5958cb
VADDPD.RD_SAE Z3, Z2, K1, Z1 // 62f1ed3958cb
VADDPD.RZ_SAE Z3, Z2, K1, Z1 // 62f1ed7958cb
VADDPD.RN_SAE Z3, Z2, K1, Z1 // 62f1ed1958cb
VADDPD.RU_SAE.Z Z3, Z2, K1, Z1 // 62f1edd958cb
VADDPD.RD_SAE.Z Z3, Z2, K1, Z1 // 62f1edb958cb
VADDPD.RZ_SAE.Z Z3, Z2, K1, Z1 // 62f1edf958cb
VADDPD.RN_SAE.Z Z3, Z2, K1, Z1 // 62f1ed9958cb
// Embedded broadcasting.
VADDPD.BCST (AX), X2, K1, X1 // 62f1ed195808
VADDPD.BCST.Z (AX), X2, K1, X1 // 62f1ed995808
VADDPD.BCST (AX), Y2, K1, Y1 // 62f1ed395808
VADDPD.BCST.Z (AX), Y2, K1, Y1 // 62f1edb95808
VADDPD.BCST (AX), Z2, K1, Z1 // 62f1ed595808
VADDPD.BCST.Z (AX), Z2, K1, Z1 // 62f1edd95808
VMAXPD.BCST (AX), Z2, K1, Z1 // 62f1ed595f08
VMAXPD.BCST.Z (AX), Z2, K1, Z1 // 62f1edd95f08
// Surpress all exceptions (SAE).
VMAXPD.SAE Z3, Z2, K1, Z1 // 62f1ed595fcb or 62f1ed195fcb
VMAXPD.SAE.Z Z3, Z2, K1, Z1 // 62f1edd95fcb or 62f1ed995fcb
VCMPSD.SAE $0, X0, X2, K0 // 62f1ef18c2c000
VCMPSD.SAE $0, X0, X2, K1, K0 // 62f1ef19c2c000
VMAXPD (AX), Z2, K1, Z1 // 62f1ed495f08
// Multisource operands (4FMAPS/4VNNIW register range operand).
VP4DPWSSD (AX), [Z0-Z3], K1, Z7 // 62f27f495238
VP4DPWSSD 7(DX), [Z0-Z3], K1, Z7 // 62f27f4952ba07000000
// K write mask.
VADDPD X30, X1, X0 // 6291f50858c6
VADDPD X2, X1, K1, X0 // 62f1f50958c2 Details:
|
The "always prefer VEX over EVEX" rule combined with "no explicit K0" or any other way to enforce EVEX encoding lead to this: // VEX -- OK.
VADDPD (BX), X9, X2
// Two possible outcomes:
// a) signal error: "instruction does not support zeroing".
// b) select EVEX-encoded form.
VADDPD.Z (BX), X9, X2 I do believe that In my opinion, it's a // EVEX -- OK.
VADDPD.Z (BX), X9, K0, X2 |
I'm not sure that VADDPD.Z (BX), X9, X2 is an important case. As far as I understand this will use default write mask k0, so no element will be zeroed. However we have the same problem with broadcasting, which can be useful without any masks. |
With a simple rule like "skip non-EVEX forms if instruction has any suffixes", we can solve those issues of operand based matching: // Forced EVEX encoding due to suffixes.
VADDPD.B4 2032(DX), X0, X0 // 62f1fd185882f0070000
VADDPD.B8 2032(DX), Y0, Y0 // 62f1fd385882f0070000 This is possible because x86 uses suffixes only for AVX512 features. |
@TocarIP, zeroing without masking is permitted by GAS, but rejected by, for example, XED. It can be used to force EVEX encoding when VEX will be selected otherwise (see above). |
Change https://golang.org/cl/104496 mentions this issue: |
Up-to-date examples: https://golang.org/cl/107217. |
Change https://golang.org/cl/107216 mentions this issue: |
Change https://golang.org/cl/113315 mentions this issue: |
Now generates both VEX and EVEX encoded optabs. Encoder based on these optabs passes tests added in https://golang.org/cl/107217. This version uses XED datafiles directly instead of x86.csv. Also moves x86/x86spec/xeddata package to x86/xeddata to make it usable from x86 packages. Ported x86spec pattern set type to xeddata. Updates golang/go#22779 Change-Id: I304267d888dcda4f776d1241efa524f397a8b7b3 Reviewed-on: https://go-review.googlesource.com/107216 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
- Uncomment tests for AVX512 encoder - Permit instruction suffixes for x86 - Permit limited reg list [reg-reg] syntax for x86 for multi-source ops - EVEX encoding support in obj/x86 (Z-cases, asmevex, etc.) - optabs and ytabs generated by x86avxgen (https://golang.org/cl/107216) Note: suffix formatting implemented with updated CConv function. Now arch asm backend should register formatting function by calling RegisterOpSuffix. Updates #22779 Change-Id: I076a167ee49582700e058c56ad74e6696710c8c8 Reviewed-on: https://go-review.googlesource.com/113315 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Documentation is published in form of Go wiki page. I'll send a small update to This issue can be closed if there are no open questions left. |
Ping @cherrymui. Please close this issue if you are happy with the decisions documented on the wiki page. EDIT - Corrected Cherry's name so that github notification goes out. |
The wiki page looks ok to me. Closing. For testing, there is #25724 still open. If we do anything there, it may affect the design and the wiki page. But we can discuss it the there. |
Discussion started here: golang-dev: AVX512 syntax
This issue keeps track of all agreed implementation/design choices (though most things may change many times), as well as discussed alternatives.
If you can and/or want to participate, please leave the comment here or in the thread that is linked above.
1. Accepted solutions:
1.1. New registers:
X16-X31
Y16-Y31
Z0-Z31
K0-K7
masking registers. Exact operand syntax/position not yet decided.1.2. AVX512_4FMAPS register range operand:
Specified with ARM NEON-style register ranges syntax:
[Rx-Ry]
.2. Subjects under discussion:
Special syntax like
{1toX}
and{sae}
is avoided:2.1. Masking register syntax.
2.2. Zeroing syntax.
2.3. Encoding selection: VEX vs EVEX.
a) Always favor VEX encoding variants.
b) Some kind of flag to enable EVEX_ENCODING whenever beneficial.
c) Require explicit K operand for EVEX variants to give programmer full control over selected encoding.
2.4. Broadcast.
2.5. Rounding.
3. Key notes:
Useful information about past/current trade-offs
X/Y/Z
registers to make information search easier. This also implies that VEX/EVEX encoding should not be resolved by special suffix/prefix (there was no such problem with SSE vs VEX because most latter opcodes are prefixed with "V").KADDD
. Go usesL
suffix for 32bit operands, soKADDL
opcode used instead.The text was updated successfully, but these errors were encountered: