Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: less optimized AMD64 code #24537

Open
benshi001 opened this issue Mar 26, 2018 · 3 comments
Open

cmd/compile: less optimized AMD64 code #24537

benshi001 opened this issue Mar 26, 2018 · 3 comments
Labels
binary-size NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@benshi001
Copy link
Member

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

the newest master branch on March 25, 2018

Does this issue reproduce with the latest release?

not tried

What operating system and processor architecture are you using (go env)?

amd64/ubuntu16.04

What did you do?

test commit 4b00d3f#diff-6b1f55ed5d66ec08fb9cbe82259791e9

func ssa(a []int) bool {
        return a[0] > a[1]
}

is compiled to

        00000 (2)       TEXT    "".ssa(SB)
        00001 (2)       FUNCDATA        $0, gclocals·4032f753396f2012ad1784f398b170f4(SB)
        00002 (2)       FUNCDATA        $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
 v13    00003 (2)       MOVQ    "".a+8(SP), AX
 v9     00004 (3)       TESTQ   AX, AX
 b1     00005 (3)       JLS     13
 v25    00006 (3)       MOVQ    "".a(SP), CX
 v16    00007 (3)       MOVQ    (CX), DX
 v3     00008 (3)       CMPQ    AX, $1
 b2     00009 (3)       JLS     13
 v4     00010 (3)       CMPQ    8(CX), DX
 v27    00011 (3)       SETLT   "".~r1+24(SP)
 b4     00012 (3)       RET
 v12    00013 (3)       PCDATA  $0, $1
 v12    00014 (3)       CALL    runtime.panicindex(SB)
 b3     00015 (3)       UNDEF
        00016 (?)       END

which is expected. but

func ssb(a []int) (bool, bool) {
        return a[0] > a[1], a[2] != a[3]
}

is compiled to

        00000 (6)       TEXT    "".ssb(SB)
    	00001 (6)       FUNCDATA        $0, gclocals·4032f753396f2012ad1784f398b170f4(SB)
    	00002 (6)       FUNCDATA        $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
v7     00003 (6)       MOVQ    "".a+8(SP), AX
v4     00004 (7)       TESTQ   AX, AX
b1     00005 (7)       JLS     24
v38    00006 (7)       MOVQ    "".a(SP), CX
v17    00007 (7)       MOVQ    (CX), DX
v16    00008 (7)       CMPQ    AX, $1
b2     00009 (7)       JLS     24
v10    00010 (7)       MOVQ    8(CX), BX
v12    00011 (7)       CMPQ    BX, DX
v24    00012 (7)       CMPQ    AX, $2
b4     00013 (7)       JLS     24
v34    00014 (7)       MOVQ    16(CX), SI
v33    00015 (7)       CMPQ    AX, $3
b5     00016 (7)       JLS     24
v43    00017 (7)       MOVQ    24(CX), AX
v8     00018 (7)       CMPQ    AX, SI
v40    00019 (7)       CMPQ    BX, DX
v46    00020 (7)       SETLT   "".~r1+24(SP)
v14    00021 (7)       CMPQ    AX, SI
v48    00022 (7)       SETNE   "".~r2+25(SP)
b6     00023 (7)       RET
v13    00024 (7)       PCDATA  $0, $1
v13    00025 (7)       CALL    runtime.panicindex(SB)
 b3     00026 (7)       UNDEF
		00027 (?)       END

The issue here

  1. a dead "v8 00018 (7) CMPQ AX, SI" is not eliminated
  2. CMPQmem are not used as expected
@andybons
Copy link
Member

@josharian @cznic @randall77

@andybons andybons added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Mar 26, 2018
@andybons andybons added this to the Unplanned milestone Mar 26, 2018
@TocarIP
Copy link
Contributor

TocarIP commented Mar 26, 2018

Looks like after https://go-review.googlesource.com/86035 flagalloc pass can generate DEAD ops and we don't run late deadcode after flagalloc, this explains 1.

@randall77
Copy link
Contributor

For your first point, it has always been the case that when flagalloc needs to "spill" and "restore" flags, it leaves the old flag computation behind. It shouldn't be too hard to eliminate those comparisons. It wouldn't even require a full deadcode pass, because we know the args of the comparison are still used.

For your second point, this is working as expected, sort of. CMPQmem are generated, but they need to be undone during flagalloc. The problem is that we first schedule the CMPQmem, then a CMPQ for the bounds check. Because the bounds check clobbers flags, flagalloc must move the flag computation later, but it doesn't think it is safe to move the load later (it is safe, in this case, but it isn't in general).
I don't know whether we should fix this by making the scheduling smarter (moving the bounds checks earlier, maybe?) or teaching flagalloc to know when it is ok to move the load later.

@randall77 randall77 added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binary-size NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

5 participants