Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/asm: untyped imm8 operand class for x86 asm #21528

Closed
quasilyte opened this issue Aug 18, 2017 · 12 comments
Closed

proposal: cmd/asm: untyped imm8 operand class for x86 asm #21528

quasilyte opened this issue Aug 18, 2017 · 12 comments

Comments

@quasilyte
Copy link
Contributor

quasilyte commented Aug 18, 2017

Many amd64 instructions take immediate operand without auto sign extension.
When we only care about binary representation, both "$128" and "$-128" have the same value.

In Intel manual, VPBLENDD have imm8 operand class. This means "signed 8bit value".
But semantics only care about bits, so it totally makes sense to use 0xF0 as an argument.
Well, src/crypto/sha512/sha512block_amd64.s:524 does exactly this:
VPBLENDD $0xF0, Y2, Y4, Y4

GAS also happily accepts both positive and negative immediate values.

VPSHUFD $-128, (%R11), %YMM15
VPSHUFD $128, (%R11), %YMM15

// Note: same encoding
   6:	c4 41 7d 70 3b 80    	vpshufd $0x80,(%r11),%ymm15
   c:	c4 41 7d 70 3b 80    	vpshufd $0x80,(%r11),%ymm15

I propose backwards-compatible change that includes a new opclass - Yimmb which is a union of {Yi0, Yi1, Yi8, Yu8, Yu7} opclasses via ycover.
From a user point of view, this change will make it possible to express immediate argument in both signed and unsigned form.
This first step helps us to reduce yvex_xyi3 to 2 ytab entries.

The other step that may be proposed later: allowing more instructions to accept Yimmb opclass instead of Yi8/Yu8.

pros:

  • More compatibility with existing assemblers
  • Easier to express "imm8" operands that are not numerical values by their nature (e.g. masks)
  • Easier to automatically generate instructions that work without artifical restrictions
  • If we really need both Yi8 and Yu8, we can encode them in 1 ytab instead of duplication (see yvex_xyi3)

Even if asm6.go is going to be automatically generated somewhere in future,
the exact dates are unknown, so this simple change may make sense for the current.

Implementation:

Possible patch is provided below.

Yimmb.patch
diff --git a/src/cmd/asm/internal/asm/testdata/amd64enc.s b/src/cmd/asm/internal/asm/testdata/amd64enc.s
index ec888bc..02ccebb 100644
--- a/src/cmd/asm/internal/asm/testdata/amd64enc.s
+++ b/src/cmd/asm/internal/asm/testdata/amd64enc.s
@@ -10681,4 +10681,17 @@ TEXT asmtest(SB),DUPOK|NOSPLIT,$0
 	//TODO: XSAVES64 (R11)                  // 490fc72b
 	//TODO: XSETBV                          // 0f01d1
 	XTEST                                   // 0f01d6
+	// Note: code below is written by hand,
+	// so it seems like a good idea to move
+	// them into separate file.
+	// Maybe test runner should permit include
+	// directives (right now it will not
+	// work as intended).
+	//
+	// * Check that immb opclass works as expected.
+	VPSHUFD $-7, (R11), Y15                 // c4417d703bf9
+	VPSHUFD $-128, (R11), Y15               // c4417d703b80
+	VPSHUFD $128, (R11), Y15                // c4417d703b80
+	VPSHUFD $255, (R11), Y15                // c4417d703bff
+	VPSHUFD $0, (R11), Y1                   // c4c17d700b00
 	RET
diff --git a/src/cmd/internal/obj/x86/asm6.go b/src/cmd/internal/obj/x86/asm6.go
index bcf9318..33e3d8a 100644
--- a/src/cmd/internal/obj/x86/asm6.go
+++ b/src/cmd/internal/obj/x86/asm6.go
@@ -92,11 +92,12 @@ type Movtab struct {
 const (
 	Yxxx = iota
 	Ynone
-	Yi0 // $0
-	Yi1 // $1
-	Yi8 // $x, x fits in int8
-	Yu8 // $x, x fits in uint8
-	Yu7 // $x, x in 0..127 (fits in both int8 and uint8)
+	Yi0   // $0
+	Yi1   // $1
+	Yi8   // $x, x fits in int8
+	Yu8   // $x, x fits in uint8
+	Yu7   // $x, x in 0..127 (fits in both int8 and uint8)
+	Yimmb // $x, x <- {Yi0, Yi1, Yi8, Yu8, Yu7}
 	Ys32
 	Yi32
 	Yi64
@@ -811,10 +812,8 @@ var yvex_ri3 = []ytab{
 }
 
 var yvex_xyi3 = []ytab{
-	{Yu8, Yxm, Yxr, Zvex_i_rm_r, 2},
-	{Yu8, Yym, Yyr, Zvex_i_rm_r, 2},
-	{Yi8, Yxm, Yxr, Zvex_i_rm_r, 2},
-	{Yi8, Yym, Yyr, Zvex_i_rm_r, 2},
+	{Yimmb, Yxm, Yxr, Zvex_i_rm_r, 2},
+	{Yimmb, Yym, Yyr, Zvex_i_rm_r, 2},
 }
 
 var yvex_yyi4 = []ytab{ //TODO don't hide 4 op, some version have xmm version
@@ -1672,7 +1671,7 @@ var optab =
 	{AVPBROADCASTB, yvex_vpbroadcast, Pvex, [23]uint8{VEX_128_66_0F38_W0, 0x78, VEX_256_66_0F38_W0, 0x78}},
 	{AVPTEST, yvex_xy2, Pvex, [23]uint8{VEX_128_66_0F38_WIG, 0x17, VEX_256_66_0F38_WIG, 0x17}},
 	{AVPSHUFB, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F38_WIG, 0x00, VEX_256_66_0F38_WIG, 0x00}},
-	{AVPSHUFD, yvex_xyi3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0x70, VEX_256_66_0F_WIG, 0x70, VEX_128_66_0F_WIG, 0x70, VEX_256_66_0F_WIG, 0x70}},
+	{AVPSHUFD, yvex_xyi3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0x70, VEX_256_66_0F_WIG, 0x70}},
 	{AVPOR, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0xeb, VEX_256_66_0F_WIG, 0xeb}},
 	{AVPADDQ, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0xd4, VEX_256_66_0F_WIG, 0xd4}},
 	{AVPADDD, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0xfe, VEX_256_66_0F_WIG, 0xfe}},
@@ -2003,17 +2002,25 @@ func instinit(ctxt *obj.Link) {
 	ycover[Yi1*Ymax+Yu8] = 1
 	ycover[Yu7*Ymax+Yu8] = 1
 
+	ycover[Yi0*Ymax+Yimmb] = 1
+	ycover[Yi1*Ymax+Yimmb] = 1
+	ycover[Yu7*Ymax+Yimmb] = 1
+	ycover[Yu8*Ymax+Yimmb] = 1
+	ycover[Yi8*Ymax+Yimmb] = 1
+
 	ycover[Yi0*Ymax+Ys32] = 1
 	ycover[Yi1*Ymax+Ys32] = 1
 	ycover[Yu7*Ymax+Ys32] = 1
 	ycover[Yu8*Ymax+Ys32] = 1
 	ycover[Yi8*Ymax+Ys32] = 1
+	ycover[Yimmb*Ymax+Ys32] = 1
 
 	ycover[Yi0*Ymax+Yi32] = 1
 	ycover[Yi1*Ymax+Yi32] = 1
 	ycover[Yu7*Ymax+Yi32] = 1
 	ycover[Yu8*Ymax+Yi32] = 1
 	ycover[Yi8*Ymax+Yi32] = 1
+	ycover[Yimmb*Ymax+Yi32] = 1
 	ycover[Ys32*Ymax+Yi32] = 1
 
 	ycover[Yi0*Ymax+Yi64] = 1
@@ -2021,6 +2028,7 @@ func instinit(ctxt *obj.Link) {
 	ycover[Yu7*Ymax+Yi64] = 1
 	ycover[Yu8*Ymax+Yi64] = 1
 	ycover[Yi8*Ymax+Yi64] = 1
+	ycover[Yimmb*Ymax+Yi64] = 1
 	ycover[Ys32*Ymax+Yi64] = 1
 	ycover[Yi32*Ymax+Yi64] = 1
@gopherbot gopherbot added this to the Proposal milestone Aug 18, 2017
@quasilyte quasilyte changed the title proposal: cmd/asm: untyped imm8 operand class for x86 asm backend proposal: cmd/asm: untyped imm8 operand class for x86 asm Aug 18, 2017
@quasilyte
Copy link
Contributor Author

I want to emphasize: it does not break any existing assembly code;
it only reduces immediate argument encoding restrictions (two’s complement/negative + positive instead of just one of them).

@gopherbot
Copy link

Change https://golang.org/cl/57570 mentions this issue: cmd/internal/obj/x86: new operand class for x86 asm backend

@rsc
Copy link
Contributor

rsc commented Aug 21, 2017

I tried pretty hard in the instruction tables to distinguish the ones that interpret the argument as signed vs unsigned. If one is wrong I'd like to fix that. Do you have an example of one that's wrong?

If VPBLENDD only cares about bits (and 0x80 is a bit and not the sign), then it should be defined to take uimm8.

Your comment:

Well, src/crypto/sha512/sha512block_amd64.s:524 does exactly this:
VPBLENDD $0xF0, Y2, Y4, Y4

makes me think that VPBLENDD is already correct, or else it wouldn't assemble.

@quasilyte
Copy link
Contributor Author

@rsc, VPBLENDD is a bit special due to current amd64 asm handling of 4 operand instructions.
Each 4 operand instruction does not check against any of the particular immediate oclass (Yi8/Yu8,...).

When instructions are generated from, for example, x86.csv (in x86spec format),
we need to make a decision about Yi8 vs Yu8.
From the spec entry, we know that operand have a kind of imm8 which can map to Yi8 or Yu8.
It is easier to mark all such arguments with proposed Yimmb which permits both Yi8 and Yu8.

As a side effect of this, the result behavior is also more compatible with widely used assemblers.

@quasilyte
Copy link
Contributor Author

quasilyte commented Aug 21, 2017

To make it clear, current 4 operand instructions semantics for immediate argument are the same as Yimmb oclass, its just less explicit and feels more like a bug than a feature.

@rsc
Copy link
Contributor

rsc commented Aug 21, 2017

I don't want to be sloppy here. Assembly is hard enough to get right. I want to be precise. Either it's a signed value or an unsigned value, and it's OK for the assembler to insist that you know which one.

I understand that other assemblers take a different approach, and that's fine, but it's not our approach. Our approach is to be picky, to try to help avoid bugs.

@quasilyte
Copy link
Contributor Author

@rsc, I see.

Maybe we can make x86spec format express signed/unsigned immediate arguments?
If x86.csv will have this information, code generation based on it can be simplified.

@quasilyte
Copy link
Contributor Author

quasilyte commented Aug 22, 2017

I tried pretty hard in the instruction tables to distinguish the ones that interpret the argument as signed vs unsigned. If one is wrong I'd like to fix that. Do you have an example of one that's wrong?

@rsc, I may be wrong, but it seems like yshl should have Yu8 oclass instead of Yi32.
https://github.com/golang/go/blob/master/src/cmd/internal/obj/x86/asm6.go#L401

var yshl = []ytab{
	{Yi1, Ynone, Yml, Zo_m, 2},
	{Yi32, Ynone, Yml, Zibo_m, 2}, // <- here
	{Ycl, Ynone, Yml, Zo_m, 2},
	{Ycx, Ynone, Yml, Zo_m, 2},
}

If Yi32 is the right thing there,
the amount of my confusion will be enormous. :)

I can check other instructions one-by-one later.

@rsc
Copy link
Contributor

rsc commented Aug 28, 2017

x86.csv does have this information: it says imm8 for signed or imm8u for unsigned. (Sorry for the late reply.)

@rsc
Copy link
Contributor

rsc commented Aug 28, 2017

The yshl table can be corrected to Yu8, I believe. That code is much older than Go and is in some places sloppier than I would prefer.

@gopherbot
Copy link

Change https://golang.org/cl/62190 mentions this issue: cmd/asm: restrict x86 shift ops to 8bit args

gopherbot pushed a commit that referenced this issue Sep 7, 2017
Change "yshl" and "yshb" immediate oclass from Yi32 to Yu8.
This forbids:
- negative shift counts
- shift counts that not fit into 8bit

Affects:
  RCL{B,L,Q,W}
  RCR{B,L,Q,W}
  ROL{B,L,Q,W}
  ROR{B,L,Q,W}
  SAL{B,L,Q,W}
  SAR{B,L,Q,W}
  SHL{B,L,Q,W}
  SHR{B,L,Q,W}

Issue #21528 has some additional context about this change.

Change-Id: I60884cb2b41a860820889fcd878ca6f564006b4a
Reviewed-on: https://go-review.googlesource.com/62190
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@quasilyte
Copy link
Contributor Author

Going to close this.
All immediate operands will remain explicitly signed/unsigned.

@golang golang locked and limited conversation to collaborators Sep 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants