Skip to content

runtime: "go list std cmd" fault #11105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aclements opened this issue Jun 6, 2015 · 5 comments
Closed

runtime: "go list std cmd" fault #11105

aclements opened this issue Jun 6, 2015 · 5 comments
Milestone

Comments

@aclements
Copy link
Member

Many of the builders are failing with a fault in go list std cmd such as the one below. I can reproduce this in about 1 in 10 runs with

stress go list cmd

This appears to be a combination of stack barriers and GOMAXPROCS>1. With GOMAXPROCS=1 or with stack barriers disabled, it doesn't happen.

I think this may be a write through an up-pointer without a write barrier. With the series ending at CL 10791 applied (which fixes checkmarks mode and some bugs in stack barriers), if you set GODEBUG=gccheckmark=1, it quickly fails with a checkmark failure from a slot in the stack that's a fair bit above the next stack barrier for that stack. I confirmed that this is not the known problem with unbuffered channel sends writing to a remote stack.

Annoyingly, I can't reproduce this in my memory tracing Pin tool, so I can't take the brute force approach to finding out what wrote to that stack slot. This also suggests that there may be a race involved that Pin's slow down is preventing.

From http://build.golang.org/log/0641f2eba9b33697d963dd2e22cbffe7a99bcbd4:

2015/06/06 20:38:42 Error running go list std cmd: exit status 2, unexpected fault address 0x43202f2f
fatal error: fault
[signal 0xb code=0x1 addr=0x43202f2f pc=0x8167ab4]

goroutine 1 [running]:
runtime.throw(0x844c358, 0x5)
    /tmp/workdir/go/src/runtime/panic.go:527 +0x7f fp=0x18af5a7c sp=0x18af5a70
runtime.sigpanic()
    /tmp/workdir/go/src/runtime/sigpanic_unix.go:27 +0x276 fp=0x18af5aa8 sp=0x18af5a7c
strings.Index(0x43202f2f, 0x7279706f, 0x8442810, 0x3, 0xffffffff)
    /tmp/workdir/go/src/strings/strings.go:166 +0x104 fp=0x18af5adc sp=0x18af5aa8
strings.Contains(0x43202f2f, 0x7279706f, 0x8442810, 0x3, 0x0)
    /tmp/workdir/go/src/strings/strings.go:133 +0x39 fp=0x18af5af4 sp=0x18af5adc
main.importPaths(0x18ca2000, 0xd1, 0x140, 0x0, 0x0, 0x0)
    /tmp/workdir/go/src/cmd/go/main.go:341 +0xfa fp=0x18af5ba0 sp=0x18af5af4
main.packagesAndErrors(0x1880a140, 0x2, 0x2, 0x0, 0x0, 0x0)
    /tmp/workdir/go/src/cmd/go/pkg.go:1007 +0x1a4 fp=0x18af5ca4 sp=0x18af5ba0
main.packages(0x1880a140, 0x2, 0x2, 0x0, 0x0, 0x0)
    /tmp/workdir/go/src/cmd/go/pkg.go:988 +0x64 fp=0x18af5d18 sp=0x18af5ca4
main.runList(0x864df40, 0x1880a140, 0x2, 0x2)
    /tmp/workdir/go/src/cmd/go/list.go:177 +0x192 fp=0x18af5e18 sp=0x18af5d18
main.main()
    /tmp/workdir/go/src/cmd/go/main.go:177 +0x656 fp=0x18af5fa8 sp=0x18af5e18
runtime.main()
    /tmp/workdir/go/src/runtime/proc.go:111 +0x276 fp=0x18af5fd0 sp=0x18af5fa8
runtime.goexit()
    /tmp/workdir/go/src/runtime/asm_386.s:1666 +0x1 fp=0x18af5fd4 sp=0x18af5fd0

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /tmp/workdir/go/src/runtime/asm_386.s:1666 +0x1

goroutine 5 [syscall]:
os/signal.loop()
    /tmp/workdir/go/src/os/signal/signal_unix.go:22 +0x1a
created by os/signal.init.1
    /tmp/workdir/go/src/os/signal/signal_unix.go:28 +0x36

/cc @RLH @rsc

@aclements aclements self-assigned this Jun 6, 2015
@aclements aclements added this to the Go1.5 milestone Jun 6, 2015
@minux
Copy link
Member

minux commented Jun 7, 2015 via email

@aclements
Copy link
Member Author

My bigger concern is that this might mean we're missing a write barrier in some corner case and stack barriers just make us more sensitive. This is the case, for example, with unbuffered channel sends that write to another G's stack and that G doesn't get scheduled before mark termination, which Russ is going to fix Very Soon. Or it might just be a bug in stack barriers.

@aclements
Copy link
Member Author

I believe the unmarked object is the backing store of the stk slice in cmd/go/pkg.go:packagesAndErrors. This is a []string (as type importStack), and a pointer to it is passed down to loadPackage and many other functions from there. stk is on packagesAndErrors's stack because it never escapes, but the backing store is in the heap.

The only obviously interesting thing that happens to stk is that *importStack has a push method that does an append on the pointed-to slice. But the disassembly of the push method certainly appears to be executing a write barrier when it writes the new backing store through the receiver.

Here's a representative checkmarks failure with some annotations:

runtime:greyobject: checkmarks finds unexpected unmarked object obj=0xc208844070
runtime: found obj at *(0xc2081457a0+0x68)
# ^ AKA 0xc208145808
base=0xc2081457a0 k=0x61040a2 s.start*_PageSize=0xc20813e000 s.limit=0x0 s.sizeclass=0 s.elemsize=0
obj=0xc208844070 k=0x6104422 s.start*_PageSize=0xc208844000 s.limit=0xc208845ff0 s.sizeclass=8 s.elemsize=112
 *(obj+0) = 0xc20816e858
 *(obj+8) = 0xd
 *(obj+16) = 0xc2087bb531
 *(obj+24) = 0x5
 *(obj+32) = 0xc2087bbbd1
 *(obj+40) = 0x5
 *(obj+48) = 0xc2088325d1
 *(obj+56) = 0x6
 *(obj+64) = 0x9463c0
 *(obj+72) = 0x7
 *(obj+80) = 0x0
 *(obj+88) = 0x0
 *(obj+96) = 0x0
 *(obj+104) = 0x0
# ^ This is the backing store of stk with a few short strings in it.
fatal error: checkmark found unmarked object

...

goroutine 1 [runnable]:
...
main.(*Package).load(0xc20880c400, 0xc208145808, 0xc20878db00, 0x0, 0x0, 0x0)
    /home/austin/go.dev/src/cmd/go/pkg.go:624 +0x4eaa fp=0xc208145448 sp=0xc208144b00
main.loadPackage(0xc20816e858, 0xd, 0xc208145808, 0x0)
    /home/austin/go.dev/src/cmd/go/pkg.go:952 +0x958 fp=0xc208145748 sp=0xc208145448
# ^ 0xc208145808 is the location of the pointer to the unmarked object.
# In the code, this is &stk passed to loadPackage.
main.packagesAndErrors(0xc208094d80, 0x3f, 0x48, 0x0, 0x0, 0x0)
    /home/austin/go.dev/src/cmd/go/pkg.go:1014 +0x3b4 fp=0xc208145938 sp=0xc208145748
# ^ checkmarks base is between sp and fp of packagesAndErrors.
...

@aclements
Copy link
Member Author

Found it. This is, in fact, a bug I thought I had fixed, but I fixed it wrong. Commit 1303957 was supposed to enable write barriers during the scan phase, but it only enabled calls to the write barrier during the scan phase. The write barrier itself then looked at the phase and bailed out during scan phase. CL coming shortly (assuming I don't lose in-flight wifi again).

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/10801 mentions this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants