Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: combine AND 0xff and MOVQload into MOVBQZXload #15105

Open
martisch opened this issue Apr 4, 2016 · 6 comments
Open

cmd/compile: combine AND 0xff and MOVQload into MOVBQZXload #15105

martisch opened this issue Apr 4, 2016 · 6 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Milestone

Comments

@martisch
Copy link
Contributor

martisch commented Apr 4, 2016

I noticed that for uint&0xFF go tip on amd64 generates e.g.:

0x001d 00029 (main.go:10) MOVQ "".u+8(FP), DX
0x0022 00034 (main.go:10) ANDQ $255, DX

http://play.golang.org/p/CKhi_nt615

It might be possible to make these a zero latency MOV that does not require an ALU by using MOVZX for the applicable AND values or it might be possible to change the previous load to be MOVZX.

0x0022 00034 (main.go:10) MOVBQZX DL, DX

See "Intel 64 and IA-32 Architectures Optimization Reference Manual" Section: "3.5.1.13 Zero-Latency MOV Instructions"

@brtzsnr
Copy link
Contributor

brtzsnr commented Apr 4, 2016

MOVQ "".u+8(FP), DX is a LoadReg of the argument into the DX register. I don't think we can remove this anytime soon.

@gopherbot
Copy link

CL https://golang.org/cl/21508 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 6, 2016
Where possible replace ANDQ with MOV?ZX.
Takes care that we don't regress wrt bounds checking,
for example [1000]int{}[i&255].

According to "Intel 64 and IA-32 Architectures Optimization Reference
Manual" Section: "3.5.1.13 Zero-Latency MOV Instructions"
MOV?ZX instructions have zero latency on newer processors.

Updates #15105

Change-Id: I63539fdbc5812d5563aa1ebc49eca035bd307997
Reviewed-on: https://go-review.googlesource.com/21508
Reviewed-by: Айнар Гарипов <gugl.zadolbal@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
@bradfitz bradfitz added this to the Unplanned milestone Apr 7, 2016
@randall77
Copy link
Contributor

func f(x uint) uint {
    return x & 0xff
}
func g(x *uint) uint {
    return *x & 0xff
}

code from f:

    0x0000 00000 (/usr/local/google/home/khr/go/tmp1.go:4)  MOVQ    "".x+8(FP), AX
    0x0005 00005 (/usr/local/google/home/khr/go/tmp1.go:4)  MOVBLZX AL, AX

code from g:

    0x0000 00000 (/usr/local/google/home/khr/go/tmp1.go:7)  MOVQ    "".x+8(FP), AX
    0x0005 00005 (/usr/local/google/home/khr/go/tmp1.go:7)  MOVQ    (AX), CX
    0x0008 00008 (/usr/local/google/home/khr/go/tmp1.go:7)  MOVBLZX CL, CX

So we are now rewriting the AND to a MOVZX.

Alexandru is right, combining the extension with the load in f is harder because the MOVQ is generated from a LoadReg which doesn't appear in the SSA intermediate representation until regalloc. For g we could combine the extension and load. I'll send out a CL. It won't handle all cases, but the common case like in g it should be simple. (The harder cases are indexed loads and such where we would need lots more SSA opcodes to cover all the addressing modes.)

@josharian
Copy link
Contributor

The LoadReg case reminds me a lot of #15300.

I know you have serious reservations about a post-regalloc peep pass, Keith, but it does seem like it might be the right way to handle this small, specific class of optimizations. And perhaps with docs and code review we could keep it from growing past the bare minimum necessary.

@gopherbot
Copy link

CL https://golang.org/cl/28190 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/31639 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 24, 2016
According to "Intel 64 and IA-32 Architectures Optimization Reference
Manual" Section: "3.5.1.13 Zero-Latency MOV Instructions"
MOV?ZX instructions have zero latency on newer processors.

during make.bash:
(ANDLconst [0xFF] x) -> (MOVBQZX x)
applies 422 times
(ANDLconst [0xFFFF] x) -> (MOVWQZX x)
applies 114 times

Updates #15105

Change-Id: I10933af599de3c26449c52f4b5cd859331028f39
Reviewed-on: https://go-review.googlesource.com/31639
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
gopherbot pushed a commit that referenced this issue Oct 27, 2016
For cases where we already have the ops, combine
sign or zero extension with the previous load
(even if the load is larger width).

Update #15105

Change-Id: I76c5ddd69e1f900d2a17d35503083bd3b4978e48
Reviewed-on: https://go-review.googlesource.com/28190
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
@randall77 randall77 changed the title cmd/compile: generate MOVZX instead of AND where applicable cmd/compile: combine AND 0xff and MOVQload into MOVBQZXload Apr 2, 2019
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Projects
None yet
Development

No branches or pull requests

6 participants