Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/internal/gcimporter: too much CPU wasted on copying bytes #46564

Closed
g7r opened this issue Jun 4, 2021 · 7 comments
Closed

go/internal/gcimporter: too much CPU wasted on copying bytes #46564

g7r opened this issue Jun 4, 2021 · 7 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Milestone

Comments

@g7r
Copy link
Contributor

g7r commented Jun 4, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/sz/.cache/go-build"
GOENV="/home/sz/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/sz/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/sz/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/sz/sdk/go1.16.4"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/sz/sdk/go1.16.4/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16.4"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3202951029=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I'm using go vet to do static checks on my project code. Said io.ReadAll wastes some 30% of time on my codebase. I've managed to reproduce the issue on smaller codebase. The code to reproduce is here: https://gist.githubusercontent.com/g7r/fef9fbf7577bb380de26587c08681b12/raw/1486d5d986f5f3222befca5197cee806632d82d2/repro_test.go. Run go test -bench . -benchtime 10000x -cpuprofile cpu.prof repro_test.go to gather profile.

What did you expect to see?

I expect not to see that large chunk of profile is dedicated to move memory while reading file into byte slice.

What did you see instead?

A lot of memory is being moved during file read. See attached profile screenshot: 2021-06-01-003911_3840x2073_scrot

I suggest we should remove io.ReadAll and use Reader in iImportData.

@mvdan
Copy link
Member

mvdan commented Jun 4, 2021

cc @dominikh who has profiled his static analysis tools for a while

@dominikh
Copy link
Member

dominikh commented Jun 4, 2021

I don't have much to say about this specific issue/"proposal", but I have a data point to contribute: I have a custom fork of gcimporter in Staticcheck, with the following changes to improve performance:

  • allow reuse of byte slices in GetExportData
  • only read type information from archives, skipping object code
  • read directly from an os.File when possible, skipping the indirection through a buffered reader
  • use unsafe for a more efficient implementation of (*token.File).SetLines

The 2nd and 3rd point likely have overlap with what's proposed here.

@dominikh
Copy link
Member

dominikh commented Jun 4, 2021

To comment on the suggestion to use io.Reader instead: with my changes, the amount of data read is quite reduced, making (the equivalent to) ReadAll much less expensive. Also, it's not a trivial change; IImportData is written around having random access to the data. An io.Reader alone wouldn't suffice, we'd also need seeking, and it would quite uglify the code. Mmap would be an alternative, if it didn't come with all the downsides of mmap.

@dr2chase dr2chase added this to the Backlog milestone Jun 4, 2021
@dr2chase dr2chase added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jun 4, 2021
@griesemer
Copy link
Contributor

cc: @findleyr @mdempsky

@mdempsky
Copy link
Member

mdempsky commented Jun 4, 2021

The compiler's importer (which gcimporter is derived from) simply mmaps the files and then lets them linger until process termination. This is acceptable for the compiler because it only reads a bounded number of exports. It's also necessary because the compiler simply returns string references that point directly into the mmap'd memory.

As @dominikh mentions, the export data format is structured around random access, so changing to an io.Reader isn't going to help. Changing to a ReaderAt would potentially work though without too much trouble: just change importReader.declReader into an io.SectionReader or something.

I've also designed a new export data format. When I get around to the go/types importer for it, I'll make sure to explore using ReaderAt instead of reading the file into memory as a byte slice.

@gopherbot
Copy link

Change https://golang.org/cl/325429 mentions this issue: go/internal/gcimporter: don't waste CPU copying bytes in io.ReadAll``

@g7r
Copy link
Contributor Author

g7r commented Jun 5, 2021

Sorry for delay with the pull request, had to figure out test breakage.

I've made a quick performance comparison of master branch and my PR:

# before
$ go test -bench . -benchtime 10000x repro_test.go                                                                                           
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkGCImport-12    	  10000	  2468308 ns/op
PASS
ok  	command-line-arguments	26.259s

# after
$ go test -bench . -benchtime 10000x repro_test.go                                                                                           
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkGCImport-12    	  10000	  1975409 ns/op
PASS
ok  	command-line-arguments	22.049s

I got it that the code is under a heavy refactoring now but the fix is pretty local and gives good boost. Actually, the boost is even bigger on our internal codebase.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Projects
None yet
Development

No branches or pull requests

8 participants