Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: useless copies inserted by the register allocator #14504

Closed
brtzsnr opened this issue Feb 25, 2016 · 5 comments
Closed

cmd/compile: useless copies inserted by the register allocator #14504

brtzsnr opened this issue Feb 25, 2016 · 5 comments

Comments

@brtzsnr
Copy link
Contributor

brtzsnr commented Feb 25, 2016

While investigating some missed optimizations I found that the register allocator inserts some useless copies. In two cases a function is compiled almost the same but one has a useless copy in it.

For example (*mspan).layout

BAD:

after regalloc [78996 ns]

b1:
v1 = InitMem <mem>
v7 = VarDef <mem> {size} v1
v11 = VarDef <mem> {n} v7
v15 = VarDef <mem> {total} v11
v19 = VarDef <mem> {total} v15
v2 = SP <uintptr> : SP
v20 = MOVQstoreconst <mem> {total} v2 v19
v23 = VarDef <mem> {n} v20
v24 = MOVQstoreconst <mem> {n} v2 v23
v27 = VarDef <mem> {size} v24
v28 = MOVQstoreconst <mem> {size} v2 v27
v29 = Arg <*mspan> {s} : s[*mspan]
v46 = LoadReg <*mspan> v29 : CX
v30 = LoweredNilCheck <void> v46 v28
Check v30 → b3
b3: ← b1
v34 = MOVQload <uintptr> [32] v46 v28 : DX
v38 = VarDef <mem> {total} v28
v59 = SHLQconst <uintptr> [13] v34 : AX
v39 = MOVQstore <mem> {total} v2 v59 v38
v45 = MOVQload <uintptr> [64] v46 v39 : CX
v47 = VarDef <mem> {size} v39
v48 = MOVQstore <mem> {size} v2 v45 v47
v63 = CMPQconst <flags> v45
UGT v63 → b6 b4
b6: ← b3
v55 = MOVQload <uintptr> {total} v2 v48 : DX
v65 = VarDef <mem> {n} v48
v31 = Copy <uintptr> v55 : AX   // EXTRA EXTRA EXTRA
v62 = DIVQU <uintptr> v31 v45 : AX
v66 = MOVQstore <mem> {n} v2 v62 v65
Plain → b5
b5: ← b4 b6
v67 = Phi <mem> v48 v66
Ret v67
b4: ← b3
Plain → b5

OK:

after regalloc [66233 ns]

b1:
v1 = InitMem <mem>
v7 = VarDef <mem> {size} v1
v11 = VarDef <mem> {n} v7
v15 = VarDef <mem> {total} v11
v19 = VarDef <mem> {total} v15
v2 = SP <uintptr> : SP
v20 = MOVQstoreconst <mem> {total} v2 v19
v23 = VarDef <mem> {n} v20
v24 = MOVQstoreconst <mem> {n} v2 v23
v27 = VarDef <mem> {size} v24
v28 = MOVQstoreconst <mem> {size} v2 v27
v29 = Arg <*mspan> {s} : s[*mspan]
v61 = LoadReg <*mspan> v29 : CX
v30 = LoweredNilCheck <void> v61 v28
Check v30 → b3
b3: ← b1
v34 = MOVQload <uintptr> [32] v61 v28 : DX
v38 = VarDef <mem> {total} v28
v46 = SHLQconst <uintptr> [13] v34 : AX
v39 = MOVQstore <mem> {total} v2 v46 v38
v45 = MOVQload <uintptr> [64] v61 v39 : CX
v47 = VarDef <mem> {size} v39
v48 = MOVQstore <mem> {size} v2 v45 v47
v53 = CMPQconst <flags> v45
UGT v53 → b4 b7
b4: ← b3
v55 = MOVQload <uintptr> {total} v2 v48 : AX
Plain → b6
b6: ← b4
v65 = VarDef <mem> {n} v48
v62 = DIVQU <uintptr> v55 v45 : AX
v66 = MOVQstore <mem> {n} v2 v62 v65
Plain → b5
b5: ← b7 b6
v67 = Phi <mem> v48 v66
Ret v67
b7: ← b3
Plain → b5

BAD has an extra "v31 = Copy v55 : AX". In BAD v31 and v55 are in the same block, but in OK v31 and v55 are in blocks b4 & b6 which haven't been merged in a previous pass.

@brtzsnr
Copy link
Contributor Author

brtzsnr commented Feb 25, 2016

@randall77

@randall77
Copy link
Contributor

Yes, we need to backwards-propagate more "I would like the result in this register" information. I've started on a CL for this a while ago but haven't gotten back to it.

@randall77 randall77 self-assigned this Feb 25, 2016
@gopherbot
Copy link

CL https://golang.org/cl/19952 mentions this issue.

@josharian josharian changed the title dev.ssa: Useless copies inserted by the register allocator cmd/compile: useless copies inserted by the register allocator Mar 3, 2016
@nanoant
Copy link

nanoant commented Mar 3, 2016

Here is another test case, originally reported at: https://groups.google.com/forum/#!topic/golang-dev/LSD504gbeBM

package fac

func Fac(n int64) int64 {
    if n < 2 {
        return 1
    }
    return n * Fac(n-1)
}

Assembly dump with go tool compile -S fac.go using latest master at the time of writing this.

    0x0000 00000 (fac.go:3) TEXT    "".Fac(SB), $16-16
    0x0000 00000 (fac.go:3) MOVQ    (TLS), CX
    0x0009 00009 (fac.go:3) CMPQ    SP, 16(CX)
    0x000d 00013 (fac.go:3) JLS 84
    0x000f 00015 (fac.go:3) SUBQ    $16, SP
    0x0013 00019 (fac.go:3) FUNCDATA    $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
    0x0013 00019 (fac.go:3) FUNCDATA    $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
    0x0013 00019 (fac.go:4) MOVQ    "".n+24(FP), CX
    0x0018 00024 (fac.go:4) CMPQ    CX, $2
    0x001c 00028 (fac.go:4) JLT $0, 70
    0x001e 00030 (fac.go:7) LEAQ    -1(CX), AX
    0x0022 00034 (fac.go:7) MOVQ    AX, (SP)
    0x0026 00038 (fac.go:7) PCDATA  $0, $0
    0x0026 00038 (fac.go:7) CALL    "".Fac(SB)
    0x002b 00043 (fac.go:7) MOVQ    8(SP), CX

Please have a look at load to DX below, then copy to AX. Later DX is no longer used. Instead we could load to AX in first place. Note: IMULQ needs AX operand.

    0x0030 00048 (fac.go:7) MOVQ    "".n+24(FP), DX
    0x0035 00053 (fac.go:7) MOVQ    DX, AX
    0x0038 00056 (fac.go:7) IMULQ   CX, AX
    0x003c 00060 (fac.go:7) MOVQ    AX, "".~r1+32(FP)
    0x0041 00065 (fac.go:7) ADDQ    $16, SP
    0x0045 00069 (fac.go:7) RET
    0x0046 00070 (fac.go:5) MOVQ    $1, "".~r1+32(FP)
    0x004f 00079 (fac.go:5) ADDQ    $16, SP
    0x0053 00083 (fac.go:5) RET
    0x0054 00084 (fac.go:5) NOP
    0x0054 00084 (fac.go:3) CALL    runtime.morestack_noctxt(SB)
    0x0059 00089 (fac.go:3) JMP 0

@gopherbot
Copy link

CL https://golang.org/cl/22160 mentions this issue.

@golang golang locked and limited conversation to collaborators Apr 20, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants