Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fatal error: releasep: invalid p state #15246

Closed
mikioh opened this issue Apr 12, 2016 · 22 comments
Closed

runtime: fatal error: releasep: invalid p state #15246

mikioh opened this issue Apr 12, 2016 · 22 comments

Comments

@mikioh
Copy link
Contributor

mikioh commented Apr 12, 2016

On the solaris-amd64-smartos buildbot. See http://build.golang.org/log/3e4aa690c01a4123a508595aa248329b26499b18.

ok      log 0.018s
ok      log/syslog  2.047s
ok      math    0.029s
releasep: m=0xc82044a000 m->p=0xc820022000 p->m=859534827520 m->mcache=0xfffffd7fff131a30 p->mcache=0xfffffd7fff131a30 p->status=3
fatal error: releasep: invalid p state

runtime stack:
runtime.throw(0x634c35, 0x19)
    /root/work/solaris-amd64-smartos-73a0185ad390/go/src/runtime/panic.go:566 +0x8b
runtime.releasep(0x2000)
    /root/work/solaris-amd64-smartos-73a0185ad390/go/src/runtime/proc.go:3365 +0x208
runtime.gcstopm()
    /root/work/solaris-amd64-smartos-73a0185ad390/go/src/runtime/proc.go:1760 +0x42
runtime.schedule()
    /root/work/solaris-amd64-smartos-73a0185ad390/go/src/runtime/proc.go:2070 +0x105
runtime.park_m(0xc82044d380)
    /root/work/solaris-amd64-smartos-73a0185ad390/go/src/runtime/proc.go:2169 +0x11a
runtime.mcall(0x0)
    /root/work/solaris-amd64-smartos-73a0185ad390/go/src/runtime/asm_amd64.s:240 +0x64
(snip)

/CC looks like runtime/GC gatekeepers (who?). It expected _Prunning but got _Pgcstop.

@mikioh mikioh added this to the Go1.7 milestone Apr 12, 2016
@mikioh
Copy link
Contributor Author

mikioh commented Apr 12, 2016

/CC @aclements (from the dying message invalid p state and #10240)

@davecheney
Copy link
Contributor

I just hit this as well

go version devel +a4650a2 Mon Apr 11 21:38:18 2016 +0000 linux/amd64

# github.com/juju/juju/provider/joyent
releasep: m=0xc820062000 m->p=0xc82001f500 p->m=859530731520 m->mcache=0x7ff2acede4b0 p->mcache=0x7ff2acede4b0 p->status=3
fatal error: releasep: invalid p state

runtime stack:
runtime.throw(0x12ce3b2, 0x19)
        /home/dfc/go/src/runtime/panic.go:566 +0x8b
runtime.releasep(0xc00000001)
        /home/dfc/go/src/runtime/proc.go:3365 +0x208
runtime.gcstopm()
        /home/dfc/go/src/runtime/proc.go:1760 +0x42
runtime.schedule()
        /home/dfc/go/src/runtime/proc.go:2070 +0x105
runtime.park_m(0xc82008ea80)
        /home/dfc/go/src/runtime/proc.go:2169 +0x11a
runtime.mcall(0x0)
        /home/dfc/go/src/runtime/asm_amd64.s:240 +0x5b

This feels like a recent regression.

@davecheney
Copy link
Contributor

I suspect a4650a2, /cc @josharian

@mikioh mikioh changed the title runtime: fatal error: releasep: invalid p state on solaris/amd64 runtime: fatal error: releasep: invalid p state Apr 12, 2016
@josharian
Copy link
Contributor

Ugh. Will take a look, although today promises to be busy. How reliably can you reproduce? Any hints for doing so? If reliably, can you confirm that it was the append changes?

@aclements
Copy link
Member

It's not specific to Solaris. It did start right after a4650a2. It's not immediately obvious to me how that would have caused this, but statistically speaking, it's a likely candidate.

$ greplogs -l -md -dashboard -E 'releasep: invalid p state'
2016-04-11T22:35:22-d1feddb/linux-amd64-clang
2016-04-11T23:15:02-501ddf7/linux-amd64-kube
2016-04-12T01:07:46-73a0185/solaris-amd64-smartos

$ greplogs -l -dashboard -E 'releasep: invalid p state' | findflakes -paths
First observed d1feddb 11 Apr 22:35 2016 (11 commits ago)
Last observed  73a0185 12 Apr 01:07 2016 (6 commits ago)
4.7% chance failure is still happening
40% failure probability (3 of 6 commits)
Likely culprits:
   40% d1feddb cmd/vet: improve documentation for flags, slightly
   24% a4650a2 cmd/compile: avoid write barrier in append fast path
   14% 32efa16 cmd/compile: added stats printing to stackalloc
    9% 7e40627 cmd/compile: zero all three argstorage slots
    5% 7f53391 cmd/compile: fix -N build
No known past failures

@josharian
Copy link
Contributor

I'm puzzled too, although spidey sense says missing VARDEF. When I'm at a computer (or if one of you beats me to it) next, we can temporarily disable the inplace append code with a well-placed && false to get this stable again.

@josharian
Copy link
Contributor

I can't seem to reproduce on darwin, which is going to make this a bear to track down. Mailed CL 21891 to disable in the meantime.

@gopherbot
Copy link

CL https://golang.org/cl/21891 mentions this issue.

@aclements
Copy link
Member

I'll see if I can reproduce on linux/amd64.

@aclements
Copy link
Member

That was surprisingly easy. It looks like it was indeed the append optimization. With the append optimization (811ebb6^) I get 3 failures out of 20 runs in stress -p 1 go test -short std. Without the append optimization (811ebb6), I get 0 failures out of 20 runs. I'll let it keep going, but that's not likely to happen by chance.

@aclements
Copy link
Member

0 failures out of 100 runs with the append optimization disabled.

@josharian
Copy link
Contributor

I can't reproduce locally. Maybe my five year old laptop is too slow, or maybe it's darwin.

In any case, I'd appreciated it greatly if someone who can get this to fail reliably for them would try patching in CL 21964. Thanks!

@aclements
Copy link
Member

Hmm. 24 "invalid p state" failures out of 1,476 runs with CL 21964. Plus 45 other failures. :(

I'm going to do an overnight run without CL 21964 to see if I can reproduce it without the append optimization in a large number of runs.

@josharian
Copy link
Contributor

Ouch. Thanks so much for doing this, @aclements, much appreciated. I look forward to the results.

@davecheney
Copy link
Contributor

Thanks @aclements I was able to trigger this once, but haven't been able to trigger it again before the CL was rolled back.

@davecheney
Copy link
Contributor

@josharian it is possible that the problem doesn't manifest on osx because of dumb luck and or inadvertent memory barriers because of differences in other runtime interactions with the operating system.

@aclements
Copy link
Member

0 "invalid p state" failures out of 2,787 runs without CL 21964 (3f66d8c, the parent of CL 21964). (20 other failures; mostly "bind: address already in use" at dial_test.go:753 in net.)

@josharian
Copy link
Contributor

Thanks, @aclements. I have some new theories, but I clearly need to be able to test on my own now instead of bothering you. :)

I tried to reproduce on GCE with a small machine running jessie; after an overnight run of 218 with CL 21964, I have no failures. At the rate you reported, I should have seen a handful. I'll leave it running for a while longer, but suggestions/ideas for setting up to reproduce myself would be welcome.

@josharian
Copy link
Contributor

I am set up to reproduce, but my reproduction rate is not as high as yours, @aclements. If it is easy (and only if it is easy), would you mind patching in CL 22197 and giving it a stress run as well? If not, no prob, I'll leave mind running overnight anyway.

@aclements
Copy link
Member

My machine's busy running benchmarks, but I've queued up a stress test of master on it, which should run tonight.

@josharian
Copy link
Contributor

My stress tests look good, and the CL has just been submitted, so I think
you can unqueue it. (Or feel free to leave it.) Thanks again for all the
fish.

On Tue, Apr 19, 2016 at 2:31 PM, Austin Clements notifications@github.com
wrote:

My machine's busy running benchmarks, but I've queued up a stress test of
master on it, which should run tonight.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#15246 (comment)

@aclements
Copy link
Member

Ran overnight on my machine with 0 failures (other than the flaky net failures) out of ~2300 runs.

@golang golang locked and limited conversation to collaborators Apr 20, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants