Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: cmd/compile/internal/gc.buildssa panic #38463

Closed
howardjohn opened this issue Apr 15, 2020 · 25 comments
Closed

cmd/compile: cmd/compile/internal/gc.buildssa panic #38463

howardjohn opened this issue Apr 15, 2020 · 25 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@howardjohn
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.14.2 linux/amd64

Does this issue reproduce with the latest release?

Yes, this was done on 1.14.2

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN="/gobin"
GOCACHE="/gocache"
GOENV="/home/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build048816231=/tmp/go-build -gno-record-gcc-switches"

This was built inside a docker image: gcr.io/istio-testing/build-tools:master-2020-04-10T20-55-56.

What did you do?

Tried to build our application.

What did you expect to see?

Build succeeds

What did you see instead?

STATIC=0 GOOS=windows LDFLAGS='-extldflags -static -s -w' common/scripts/gobuild.sh /tmp/tmp.W9NXZv7L5L/build/work/src/istio.io/istio/out/linux_amd64/release/istioctl-win.exe ./istioctl/cmd/istioctl
go: downloading github.com/inconshreveable/mousetrap v1.0.0
# k8s.io/apimachinery/pkg/util/runtime
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x74 pc=0xce6f4f]
goroutine 8 [running]:
cmd/compile/internal/gc.buildssa(0x0, 0x3, 0x0)
	/usr/local/go/src/cmd/compile/internal/gc/ssa.go:297 +0xdf
cmd/compile/internal/gc.compileSSA(0x0, 0x3)
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:298 +0x5d
cmd/compile/internal/gc.compileFunctions.func2(0xc0000ef4a0, 0xc000018e10, 0x3)
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:363 +0x49
created by cmd/compile/internal/gc.compileFunctions
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:361 +0x128

Full CI log: https://prow.istio.io/view/gcs/istio-prow/pr-logs/pull/istio_istio/22958/release-test_istio/11141
Rerunning the job with the same code the build succeeds: https://prow.istio.io/view/gcs/istio-prow/pr-logs/pull/istio_istio/22958/release-test_istio/11153

The referenced gobuild.sh script is https://github.com/istio/istio/blob/master/common/scripts/gobuild.sh.

I think the fact its in the compile code path means its a golang bug and not an issue with our code, but maybe not. Let me know if more info is needed.

@randall77
Copy link
Contributor

Definitely looks like a bug in our compiler. The pointer to the function to be compiled is nil.

Is this reproducible? Does it happen every time, or is it intermittent?

Did this happen in earlier versions of Go? (1.4.{0,1}, and 1.13.10 would be good ones to try if you haven't yet.)

I'll try to reproduce this myself.

@randall77
Copy link
Contributor

How do you populate ./istioctl/cmd/istioctl? I don't see anything in the log that makes that directory tree.

@howardjohn
Copy link
Contributor Author

@randall77 I have only see this a single time in our CI, and rerunning on the exact same job it did not occur. So its intermittent, I am not sure if its occurred more than once since we don't have a way to query through all of our test logs and I have only seen it once.

We upgrading our CI to go 1.14.2 yesterday and previously were using 1.13.10, so its possible its related to that change - I'll update this if we see this again.

The CI job is calling this make target: https://github.com/istio/istio/blob/e33092b5c59d1c13fb11d86ab6345f45194133d3/Makefile.core.mk#L403 which calls https://github.com/istio/istio/blob/e33092b5c59d1c13fb11d86ab6345f45194133d3/Makefile.core.mk#L385 which calls gobuild.sh ./istioctl/cmd/istioctl which calls go build ./istioctl/cmd/istioctl with some flags

@randall77
Copy link
Contributor

Sorry, I was asking about the source code. Presumably there are some .go files in an istioctl/cmd/istioctl/ directory somewhere. Where can I get that?

If this is intermittent, it is going to be hard to debug. There aren't any obvious ways a nil pointer might make its way into a the compile queue.

@cuonglm
Copy link
Member

cuonglm commented Apr 15, 2020

@randall77 Maybe this's a sign for #33485

@howardjohn
Copy link
Contributor Author

Ah I see. That would be https://github.com/istio/istio/tree/master/istioctl/cmd/istioctl. I'll keep an eye out for any re-occurrences as well, totally understand this isn't a lot of info to go on.

@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 16, 2020
@andybons andybons added this to the Unplanned milestone Apr 16, 2020
@randall77
Copy link
Contributor

@mdempsky
Copy link
Member

mdempsky commented Aug 27, 2020

Based on my comments in #33485, I wonder if it's related to the call to dtypesym in gc/ssa.go? In particular, could calling dtypesym be causing functions to get pushed onto compilequeue, which would be bad since that slice is being concurrently accessed by the goroutine responsible for distributing functions to the worker goroutines?

Edit: Maybe not. There's already a check for Siggen just before the call to dtypesym.

@toothrot
Copy link
Contributor

@randall77 @mdempsky Checking in on this as a 1.16 release blocker. Has anybody managed to reproduce this yet?

@randall77
Copy link
Contributor

Nope. If I had, I would post it here.
Any ideas are welcome. This is something of a stumper.

@cuonglm
Copy link
Member

cuonglm commented Sep 12, 2020

This happens again in https://build.golang.org/log/e062e1dff6ab680d17246672205cecd010d98e83

##### ../misc/reboot
Building Go cmd/dist using /workdir/go. (devel 2c95e3a6a8377ca9c72608c25b4cf2506baf782f linux/amd64)
Building Go toolchain1 using /workdir/go.
# bootstrap/cmd/link/internal/riscv64
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x70 pc=0xabac1c]

goroutine 12 [running]:
cmd/compile/internal/gc.buildssa(0x0, 0x1, 0x0)
	/workdir/go/src/cmd/compile/internal/gc/ssa.go:315 +0xfc
cmd/compile/internal/gc.compileSSA(0x0, 0x1)
	/workdir/go/src/cmd/compile/internal/gc/pgen.go:317 +0x5d
cmd/compile/internal/gc.compileFunctions.func2(0xc000767380, 0xc0005054a0, 0x1)
	/workdir/go/src/cmd/compile/internal/gc/pgen.go:382 +0x4d
created by cmd/compile/internal/gc.compileFunctions
	/workdir/go/src/cmd/compile/internal/gc/pgen.go:380 +0x129
go tool dist: FAILED: /workdir/go/bin/go install -gcflags=-l -tags=math_big_pure_go compiler_bootstrap bootstrap/cmd/...: exit status 2
--- FAIL: TestRepeatBootstrap (17.64s)
    reboot_test.go:50: exit status 2
FAIL

@gopherbot
Copy link

Change https://golang.org/cl/254617 mentions this issue: cmd/compile: make funccompile un-reentrant

gopherbot pushed a commit that referenced this issue Sep 15, 2020
Currently, there's awkward reentrancy issue with funccompile:

    funccompile -> compile -> dtypesym -> geneq/genhash/genwrapper -> funccompile

Though it's not a problem at this moment, some attempts by @mdempsky to
move order/walk/instrument into buildssa was failed, due to SSA cache
corruption.

This commit fixes that reentrancy issue, by making generated functions
to be pumped through the same compile workqueue that normal functions
are compiled. We do this by adding them to xtop, instead of calling
funccompile directly in geneq/genhash/genwrapper. In dumpdata, we look
for uncompiled functions in xtop instead of compilequeue, then finish
compiling them.

Updates #38463
Fixes #33485

Change-Id: Ic9f0ce45b56ae2ff3862f17fd979253ddc144bb5
Reviewed-on: https://go-review.googlesource.com/c/go/+/254617
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Keith Randall <khr@golang.org>
Trust: Cuong Manh Le <cuong.manhle.vn@gmail.com>
@randall77
Copy link
Contributor

If this was caused by some general corruption problem in the compiler, it might be related to #41257.

@bancek
Copy link

bancek commented Oct 1, 2020

I've also triggered this with an Go 1.14.4 in CI (go build running in Docker). I was not able to reproduce this because rerunning fixed it.

$ go version
go version go1.14.4 linux/amd64

$ go build main.go
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x74 pc=0xce6fff]

goroutine 7 [running]:
cmd/compile/internal/gc.buildssa(0x0, 0x2, 0x0)
	/usr/local/go/src/cmd/compile/internal/gc/ssa.go:297 +0xdf
cmd/compile/internal/gc.compileSSA(0x0, 0x2)
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:298 +0x5d
cmd/compile/internal/gc.compileFunctions.func2(0xc007104240, 0xc0011ab390, 0x2)
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:363 +0x49
created by cmd/compile/internal/gc.compileFunctions
	/usr/local/go/src/cmd/compile/internal/gc/pgen.go:361 +0x128

@toothrot
Copy link
Contributor

@randall77 Should we remove the release-blocker label if this is not reproducible?

@randall77
Copy link
Contributor

@toothrot No, I think we've seen enough instances that this is definitely something worth figuring out.

@cuonglm
Copy link
Member

cuonglm commented Oct 16, 2020

@toothrot No, I think we've seen enough instances that this is definitely something worth figuring out.

Can we check if this still happens in builds after CL 254617 above?

@aclements
Copy link
Member

@randall77 , would it be worth stress testing the compiler under the race detector to see if this is a race?

@randall77
Copy link
Contributor

I've already tried running the compiler under the race detector and didn't get anything. I didn't really stress it though, maybe worth banging on it harder.

@dmitshur dmitshur added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Nov 19, 2020
@aclements
Copy link
Member

This hasn't happened in a while:

greplogs -dashboard -E "^goroutine.*running.*\n.*gc\.buildssa" -l -md
2020-09-12T08:31:49-2c95e3a/linux-amd64-ssacheck
2020-08-14T17:45:39-a20cb4c/linux-amd64-nocgo
2020-05-10T14:43:46-57e32c4/openbsd-386-64
2020-03-27T21:13:06-9ceb1e5/freebsd-386-12_0
2020-02-05T22:36:18-9ee5174/freebsd-386-11_2

But it was rare enough that it's hard to say statistically that it's gone.

$ greplogs -dashboard -E "^goroutine.*running.*\n.*gc\.buildssa" -l | findflakes -paths
First observed 9ee5174 05 Feb 22:36 2020 (2659 commits ago)
Last observed  2c95e3a 12 Sep 08:31 2020 (939 commits ago)
11% chance failure is still happening
0.23% failure probability (5 of 1721 commits)
Likely culprits:
   ... (all useless) ...
No known past failures

There were some non-determinism fixes a month ago that may have fixed this.

This is so unreproducible that it may be hopeless, but we could try running the compiler under GODEBUG=gccheckmarks=1 to see if there's a GC issue.

@aclements
Copy link
Member

My earlier regexp had missed one that was indented. The chance this is still happening is getting pretty small:

$ greplogs -dashboard -e "^\s*goroutine.*running.*\n.*gc\.buildssa" -l | findflakes -paths
First observed 9ee5174 05 Feb 22:36 2020 (2774 commits ago)
Last observed  2c95e3a 12 Sep 08:31 2020 (1054 commits ago)
4.6% chance failure is still happening
0.29% failure probability (6 of 1721 commits)
Likely culprits:
   ...
No known past failures

@toothrot toothrot removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Dec 17, 2020
@aclements
Copy link
Member

Fading away into the distance...

$ greplogs -dashboard -e "^\s*goroutine.*running.*\n.*gc\.buildssa" -l | findflakes -paths
First observed 9ee5174 05 Feb 22:36 2020 (2812 commits ago)
Last observed  2c95e3a 12 Sep 08:31 2020 (1092 commits ago)
4.2% chance failure is still happening
0.29% failure probability (6 of 1721 commits)
Likely culprits:
    ...
No known past failures

@mdempsky
Copy link
Member

Does greplogs/findflakes take into account the dev branches? E.g., dev.regabi and dev.typeparams have a few hundred more commits on top of master, which should provide additional testing samples.

@aclements
Copy link
Member

They do not. Actually, fetchlogs only grabs the master branch, but even if it grabbed everything, findflakes would get completely confused if there were multiple branches in its input. Now you have me thinking about how to do the time series analysis when there are multiple converging and diverging timelines. (These tools could really use a revisit! They're held together with duct tape right now.)

@aclements
Copy link
Member

$ greplogs -dashboard -e "^\s*goroutine.*running.*\n.*gc\.buildssa" -l | findflakes -paths
First observed 9ee5174 05 Feb 22:36 2020 (2842 commits ago)
Last observed  2c95e3a 12 Sep 08:31 2020 (1122 commits ago)
3.8% chance failure is still happening
0.29% failure probability (6 of 1721 commits)

I'm calling this fixed.

@golang golang locked and limited conversation to collaborators Jan 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

10 participants