Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: unrolled loop results in extra loads/stores (suboptimal schedule?) #33251

Closed
jeremyfaller opened this issue Jul 24, 2019 · 6 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@jeremyfaller
Copy link
Contributor

What version of Go are you using (go version)?

go version devel +688f3eacea Mon Jul 22 14:19:18 2019 -0400 darwin/amd64

Does this issue reproduce with the latest release?

reproduces with slightly old sync of HEAD

What operating system and processor architecture are you using (go env)?

go env Output

GO111MODULE=""
GOARCH="amd64"
GOBIN="/Users/jfaller/go/bin"
GOCACHE="/Users/jfaller/Library/Caches/go-build"
GOENV="/Users/jfaller/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/jfaller/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/jfaller/src/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/jfaller/src/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/kt/9h7gfgy955lgbm_cys108l2c002pwh/T/go-build625773973=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

While fooling around with compiler optimization passes, I found that unrolling a dot-product yields code that generates lots of extraneous loads/stores. I've been doing some digging in, and believe the schedule pass in the compiler generates a suboptimal schedule.

What did you expect to see?

I expected relatively straight-line LOAD/LOAD/MUL/ADD code.

What did you see instead?

The compiler generates a ton of extraneous load/stores.

prod.tar.gz

@randall77
Copy link
Contributor

Yes, it looks like the scheduler is too aggressive moving loads earlier. Too many early loads means it has to immediately spill the loads, then restore them.

@randall77 randall77 added this to the Go1.14 milestone Jul 24, 2019
@randall77 randall77 changed the title cmd/compile: unrolled loop results in extra loads/stores (bad schedule?) cmd/compile: unrolled loop results in extra loads/stores (suboptimal schedule?) Jul 24, 2019
@julieqiu julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 29, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@andybons
Copy link
Member

andybons commented Nov 3, 2020

@jeremyfaller can you outline the exact steps you took to generate the ssa.html file? Thanks :)

@jeremyfaller
Copy link
Contributor Author

It's env variable. You give it the name of the function you want the SSA for:

https://dave.cheney.net/2020/06/19/how-to-dump-the-gossafunc-graph-for-a-method

So for this instance, I think it was GOSSAFUNC=$FUNC go tool compile XXX.go.

When I filed this, I did some looking at this, it looked relatively simple to me. This is from memory, and likely to be wrong, but I think there's a heap in the schedule pass that just needs different priorities for the load/stores. (Or the heap isn't there, and that's what I was going to add.) From there it was just expanding the testing, etc. Seemed like a pretty simple fix.

@jeremyfaller
Copy link
Contributor Author

Yeah, looking at src/cmd/compile/internal/ssa/schedule.go there's a heap. I think I thought it was as simple as turning that ValHeap into using float32 for scores, and just encoding the linenumber in decimal points just gently encourage "earlier in the code" to "earlier to schedule." It probably wasn't the best fix, but it would start the discussion, anyway.

@jeremyfaller
Copy link
Contributor Author

My original proposal of using the souce line number will not work. And actually, it's already implemented in the heap in the schedule pass.

I did a little bit of digging into this issue over the holidays, and it looks slightly more complicated than I originally laid out. The SSA load instructions have dependencies between them, so sequential loads are all marked as dependent on one another. As such, the scheduler has a hard time moving them around. When I get spare cycles, I'll likely take a further look at this.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
@randall77
Copy link
Contributor

This is fixed at tip, probably from the new scheduler (CL 270940) plus CL 509856.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

6 participants