Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile/internal/pgo: Detect sample value position instead of hard-coding #58292

Closed
brancz opened this issue Feb 3, 2023 · 11 comments
Closed
Assignees
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@brancz
Copy link
Contributor

brancz commented Feb 3, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20 darwin/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN="/Users/brancz/bin"
GOCACHE="/Users/brancz/Library/Caches/go-build"
GOENV="/Users/brancz/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/brancz/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/brancz"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.20"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/brancz/src/github.com/parca-dev/parca/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/rl/8sh8jb5s1c1fl77ztzy4zw6m0000gn/T/go-build3824178585=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I tried out the 1.20 PGO feature using a merged pprof profile downloaded from a Parca server.

What did you expect to see?

Compile successfully, just like when supplying a runtime-generated pprof file.

What did you see instead?

The compiler errored several times with repeatedly printing errors like this:

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/cpu
/usr/local/go/src/internal/cpu/cpu.go:12:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

It turns out the sample position for CPU samples within a profile is hard coded to be index 1 (understandable since that's exactly what a profile looks like when generated by the runtime as it creates CPU nanoseconds on index 0 and CPU samples on index 1:

SampleValue: func(v []int64) int64 { return v[1] },

Minimal reproducible example:

package main

import (
	"log"
	"os"
	"runtime/pprof"

	"github.com/google/pprof/profile"
)

func main() {
	const filename = "pgotest.prof"

	f, err := os.Create(filename)
	if err != nil {
		log.Fatal("could not create CPU profile: ", err)
	}
	if err := pprof.StartCPUProfile(f); err != nil {
		log.Fatal("could not start CPU profile: ", err)
	}

	hot()

	pprof.StopCPUProfile()

	f.Seek(0, 0)
	p, err := profile.Parse(f)
	if err != nil {
		log.Fatal("could not parse profile: ", err)
	}

	if err := f.Close(); err != nil {
		log.Fatal("could not close profile: ", err)
	}

	p.SampleType = []*profile.ValueType{p.SampleType[1]}
	for _, s := range p.Sample {
		s.Value = s.Value[1:]
	}

	if err := os.Remove(filename); err != nil {
		log.Fatal("could not remove profile: ", err)
	}

	f, err = os.Create(filename)
	if err != nil {
		log.Fatal("could not create CPU profile: ", err)
	}
	defer f.Close()

	if err := p.Write(f); err != nil {
		log.Fatal("could not write profile: ", err)
	}
}

func hot() {
	for i := 0; i < 10_000_000_000; i++ {
	}
}

Compile the program and run it:

go build pgotest.go
./pgotest

Then attempt to compile with PGO:

go build -pgo pgotest.prof pgotest.go
# internal/race
/usr/local/go/src/internal/race/norace.go:43:6: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/goarch
/usr/local/go/src/internal/goarch/zgoarch_arm64.go:32:7: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/goos
/usr/local/go/src/internal/goos/zgoos_darwin.go:25:7: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/coverage/rtcov
/usr/local/go/src/internal/coverage/rtcov/rtcov.go:31:6: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/unsafeheader
/usr/local/go/src/internal/unsafeheader/unsafeheader.go:34:6: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# unicode/utf8
/usr/local/go/src/unicode/utf8/utf8.go:65:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/cpu
/usr/local/go/src/internal/cpu/cpu.go:12:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/itoa
/usr/local/go/src/internal/itoa/itoa.go:18:6: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# math/bits
/usr/local/go/src/math/bits/bits.go:44:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/goexperiment
/usr/local/go/src/internal/goexperiment/flags.go:57:6: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# sync/atomic
/usr/local/go/src/sync/atomic/type.go:40:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# unicode
/usr/local/go/src/unicode/casetables.go:13:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new

The fix is pretty straight forward, PGO should detect at which position in p.SampleType is .Type == "samples" && .Unit == "count".

Happy to submit a fix.

@brancz
Copy link
Contributor Author

brancz commented Feb 3, 2023

Just to make extra sure, I double checked and indeed when inserting a placeholder on the 0 index it works: https://gist.github.com/brancz/9e56f8a302d6e196f57c4953156595b5

brancz added a commit to polarsignals/go that referenced this issue Feb 3, 2023
This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

Fixes golang#58292
brancz added a commit to polarsignals/go that referenced this issue Feb 3, 2023
This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

Fixes golang#58292
brancz added a commit to polarsignals/go that referenced this issue Feb 3, 2023
This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

Fixes golang#58292
@gopherbot
Copy link

Change https://go.dev/cl/465135 mentions this issue: cmd/compile/internal/pgo: fix hard-coded PGO sample data position

@prattmic prattmic added this to the Go1.21 milestone Feb 3, 2023
@prattmic
Copy link
Member

prattmic commented Feb 3, 2023

cc @cherrymui @aclements

I think we may want to backport this to 1.20. It is a simple fix. Thoughts?

As an aside, I documented this incorrectly at https://go.dev/doc/pgo#alternative-sources, saying we use sample index 0, which is wrong. (For Go CPU profiles it doesn't matter because index 0 ("cpu") is just a scaled version of index 1 ("samples")).

@prattmic prattmic added release-blocker NeedsFix The path to resolution is known, but the work has not been done. labels Feb 3, 2023
@brancz
Copy link
Contributor Author

brancz commented Feb 3, 2023

I'd love to see it as a bug fix! I did indeed go by that document. I think samples count is a good choice though, it's a simpler unit to export from profilers, as it tends to be the raw data (in our case captured from Linux perf_events).

brancz added a commit to polarsignals/go that referenced this issue Feb 3, 2023
This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

Fixes golang#58292
brancz added a commit to polarsignals/go that referenced this issue Feb 3, 2023
This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

Fixes golang#58292
@cherrymui cherrymui changed the title compile/internal/pgo: Detect sample value position instead of hard-coding cmd/compile/internal/pgo: Detect sample value position instead of hard-coding Feb 3, 2023
@cherrymui
Copy link
Member

Thanks for the issue and the CL!

I think it is safe for backport. We usually don't backport performance stuff, but this is a new feature and it could cause build failure, maybe this is fine.

@prattmic
Copy link
Member

prattmic commented Feb 3, 2023

@gopherbot please backport to 1.20. Profiles with only a single sample type can an ICE.

@gopherbot
Copy link

Backport issue(s) opened: #58309 (for 1.20).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

brancz added a commit to polarsignals/go that referenced this issue Feb 4, 2023
This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

Fixes golang#58292
brancz added a commit to polarsignals/go that referenced this issue Feb 8, 2023
This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

Fixes golang#58292
@gopherbot
Copy link

Change https://go.dev/cl/466515 mentions this issue: _content/doc/pgo: correct sample index requirements

brancz added a commit to polarsignals/go that referenced this issue Feb 8, 2023
This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

Fixes golang#58292
@gopherbot
Copy link

Change https://go.dev/cl/466438 mentions this issue: [release-branch.go1.20] cmd/compile/internal/pgo: fix hard-coded PGO sample data position

@gopherbot
Copy link

Change https://go.dev/cl/466675 mentions this issue: cmd/compile/internal/pgo: fix hard-coded PGO sample data position

@gopherbot
Copy link

Change https://go.dev/cl/467375 mentions this issue: [release-branch.go1.20] cmd/compile/internal/pgo: fix hard-coded PGO sample data position

gopherbot pushed a commit that referenced this issue Feb 10, 2023
This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

This is a redo of CL 465135, now allowing empty profiles. Note that
preprocessProfileGraph will already cause pgo.New to return nil for
empty profiles.

Fixes #58292

Change-Id: Ia6c94f0793f6ca9b0882b5e2c4d34f38e600c1e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/466675
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue Feb 10, 2023
…sample data position

This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

This is a redo of CL 465135, now allowing empty profiles. Note that
preprocessProfileGraph will already cause pgo.New to return nil for
empty profiles.

For #58292
For #58309

Change-Id: Ia6c94f0793f6ca9b0882b5e2c4d34f38e600c1e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/467375
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
johanbrandhorst pushed a commit to Pryz/go that referenced this issue Feb 12, 2023
This patch detects at which index position profiling samples that have the value-type samples count are, instead of the previously hard-coded position of index 1. Runtime generated profiles always generate CPU profiling data with the 0 index being CPU nanoseconds, and samples count at index 1, which is why this previously hasn't come up.

Fixes golang#58292

Change-Id: Idde761d53b02259f3076c4e5dcb4a96a9d53df0e
GitHub-Last-Rev: dabbf9f
GitHub-Pull-Request: golang#58294
Reviewed-on: https://go-review.googlesource.com/c/go/+/465135
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
johanbrandhorst pushed a commit to Pryz/go that referenced this issue Feb 12, 2023
This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

This is a redo of CL 465135, now allowing empty profiles. Note that
preprocessProfileGraph will already cause pgo.New to return nil for
empty profiles.

Fixes golang#58292

Change-Id: Ia6c94f0793f6ca9b0882b5e2c4d34f38e600c1e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/466675
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Feb 14, 2023
…sample data position

This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.

This is a redo of CL 465135, now allowing empty profiles. Note that
preprocessProfileGraph will already cause pgo.New to return nil for
empty profiles.

For golang#58292
For golang#58309

Change-Id: Ia6c94f0793f6ca9b0882b5e2c4d34f38e600c1e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/467375
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
gopherbot pushed a commit to golang/website that referenced this issue Feb 15, 2023
The current requirement is completely wrong. In 1.20, we unconditionally
look at sample index 1, which should either be samples/count or
cpu/nanoseconds (they have the same semantic meaning).

For golang/go#58292 (hopefully in 1.20.1), we are correcting this to
pick the first index. I am omitting describing this slight version
discrepancy due to lack of space, but I could be convinced to add it.

For golang/go#58292

Change-Id: I24bb125db50a254b18c3f5f21db0c7fdc2125229
Reviewed-on: https://go-review.googlesource.com/c/website/+/466515
Reviewed-by: Frederic Branczyk <fbranczyk@gmail.com>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Feb 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants