Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: performance regression on SSA #15934

Closed
dsnet opened this issue Jun 2, 2016 · 6 comments
Closed

cmd/compile: performance regression on SSA #15934

dsnet opened this issue Jun 2, 2016 · 6 comments

Comments

@dsnet
Copy link
Member

dsnet commented Jun 2, 2016

Using go1.7beta1

Consider the following benchmark:

package bzip2

import "bytes"
import "testing"

const mask  = 0xffffffffffff
const magic = 0x314159265359

var filter = func() (c [256]bool) {
    for i := uint(0); i < 8; i++ {
        c[0xff&(magic>>i)] = true
    }
    return c
}()

func BenchmarkSync(b *testing.B) {
    buf := bytes.Repeat([]byte{0x31, 0x41, 0x59, 0x26, 0x53, 0x59}, 1<<18)

    b.SetBytes(int64(len(buf)))
    b.ResetTimer()

    var ofs []int64
    for n := 0; n < b.N; n++ {
        ofs = ofs[:0]

        var sync uint64
        for i, b := range buf {
            sync = sync<<8 | uint64(b)
            if filter[byte(sync>>8)] && i >= 6 && (false ||
                sync&(mask<<7) == magic<<7 || sync&(mask<<8) == magic<<8 ||
                sync&(mask<<5) == magic<<5 || sync&(mask<<6) == magic<<6 ||
                sync&(mask<<3) == magic<<3 || sync&(mask<<4) == magic<<4 ||
                sync&(mask<<1) == magic<<1 || sync&(mask<<2) == magic<<2) {
                ofs = append(ofs, int64(i-6))
            }
        }
    }
}

When running without SSA:

$ go test -bench . -gcflags=-ssa=0 sync_test.go
BenchmarkSync-4          300       4131087 ns/op     380.74 MB/s

When running with SSA:

$ go test -bench . -gcflags=-ssa=1 sync_test.go
BenchmarkSync-4          300       5482100 ns/op     286.91 MB/s

This is a 25% decrease in performance and might be worth investigating why SSA is doing worse.

/cc @randall77

@dsnet dsnet added this to the Unplanned milestone Jun 2, 2016
@randall77 randall77 modified the milestones: Go1.8, Unplanned Jun 2, 2016
@randall77
Copy link
Contributor

This looks like a problem with placing spills conservatively. Quoting from a comment I put in regalloc.go:

            // TODO: schedule the spill at a point that dominates all restores.
            // The restore may be off in an unlikely branch somewhere and it
            // would be better to have the spill in that unlikely branch as well.
            // v := ...
            // if unlikely {
            //     f()
            // }
            // It would be good to have both spill and restore inside the IF.

In this case, the if is the len==cap test and f is growslice. It rarely happens that we need to call growslice, but we're spilling everything each iteration just in case we need to call growslice during that iteration. We should move the spills inside the if so they only happen if growslice is definitely called.

I'm surprised the old compiler does this optimization.

@randall77
Copy link
Contributor

See also #15845 , the fixes for these two bugs may be related.

@dsnet
Copy link
Member Author

dsnet commented Jun 2, 2016

It seems part of the problem then is related to #14758 as well.

However, if you remove the append completely, the SSA version is still ~15% slower.

@randall77
Copy link
Contributor

Weird. I see no substantive differences in the assembly with go1.6 and tip after I take the append out.

@randall77
Copy link
Contributor

...and I get only ~4% slower with tip.

@dsnet
Copy link
Member Author

dsnet commented Jun 2, 2016

I'll close this down since parts of the problem are captured in other issues. I'll reopen if I can produce a more succinct reproduction of a slow down.

@dsnet dsnet closed this as completed Jun 2, 2016
@golang golang locked and limited conversation to collaborators Jun 2, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants