Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: loops compiled differently when using range or index #31205

Open
mariecurried opened this issue Apr 2, 2019 · 5 comments
Open
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@mariecurried
Copy link

What version of Go are you using (go version)?

$ go version
go version devel +4091cf9 Sun Mar 31 23:35:35 2019 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What did you do?

I compiled two functions that sum all the elements of a slice of bytes using "range" or using an index:

"Range" version:

func sumRange(slc []byte) byte {
    res := byte(0)
    for _, v := range slc {
        res += v
    }
    return res
}

Index version:

func sumIndex(slc []byte) byte {
    res := byte(0)
    for i := 0; i < len(slc); i++ {
        res += slc[i]
    }
    return res
}

What did you expect to see?

I expected both versions of the function to compile to the exact same code.

What did you see instead?

Instead, the looping part of the function is different, resulting in differences in performance:

"Range" version:

movblzx (CX)(DX*1), SI
incq    DX
addl    SI, BX

Index version:

leaq    1(DX), SI
movblzx (CX)(DX*1), DI
addl    DI, BX
movq    SI, DX
@mundaym
Copy link
Member

mundaym commented Apr 2, 2019

What is the performance difference?

@mariecurried
Copy link
Author

mariecurried commented Apr 2, 2019

Using the following benchmark (https://play.golang.org/p/rCxWqmQXvq5), I get these numbers:

BenchmarkSumRange-8   	    3000	    437643 ns/op
BenchmarkSumIndex-8   	    3000	    482681 ns/op

So, using an explicit index seems to be slower than using "range".

@mariecurried
Copy link
Author

Something I noticed is that, if I write "sumRange" as follows, the difference disappears:

func sumRange(slc []byte) byte {
    res := byte(0)
    for i := range slc {
        res += slc[i]
    }
    return res
}

@bradfitz bradfitz added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Apr 2, 2019
@bradfitz bradfitz added this to the Unplanned milestone Apr 2, 2019
@bradfitz
Copy link
Contributor

bradfitz commented Apr 2, 2019

/cc @randall77 @josharian @cherrymui

@randall77
Copy link
Contributor

This appears to occur because of choices in the scheduler.
The inner loop in sumRange schedules like this:

v19 (5) = MOVBloadidx1 <byte> v26 v10 v1 (v[byte])
v24 (+5) = ADDQconst <int> [1] v10
v21 (+6) = ADDL <byte> v30 v19 (res[byte])

The inner loop in sumIndex schedules like this:

v27 (+13) = ADDQconst <int> [1] v11 (i[int])
v23 (+14) = MOVBloadidx1 <byte> v18 v11 v1
v24 (14) = ADDL <byte> v33 v23 (res[byte])

The schedule in sumIndex is a problem because the ADDQconst can't overwrite its argument, as the argument is not dead yet. So it has to target a different register and then get copied back.

This has been an issue for a while. I noted it in 2016: #16122 (comment)

Perhaps just a tweak to the scheduler is needed.

I'm unclear as to why the scheduler makes different choices here. Might just be how the values happened to be ordered on entry to the scheduler.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

5 participants