Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: GC crash "checkmark found unmarked object" #17694

Closed
alandonovan opened this issue Oct 31, 2016 · 5 comments
Closed

runtime: GC crash "checkmark found unmarked object" #17694

alandonovan opened this issue Oct 31, 2016 · 5 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@alandonovan
Copy link
Contributor

alandonovan commented Oct 31, 2016

$ mkdir /tmp/a
$ cd /tmp/a
$ git clone https://go.googlesource.com/go
$ cd go/src
$ git checkout 61ffec4
$ ./make.bash
$ export GOPATH=/tmp/a/got 
$ export PATH=/tmp/a/go/bin:$PATH
$ go get golang.org/x/tools/cmd/guru
$ cd /tmp/a
$ GODEBUG=gccheckmark=1 got/bin/guru referrers go/src/go/ast/ast.go:#29341 # find references to ast.GenDecl.Specs

With GOMAXPROCS=1, this program terminates normally, but otherwise, on 90% of runs, it crashes with the following output:

runtime:greyobject: checkmarks finds unexpected unmarked object obj=0xc42385e700
runtime: found obj at *(0xc422eb27b0+0x18)
base=0xc422eb27b0 k=0x6211759 s.base()=0xc422eb2000 s.limit=0x0 s.sizeclass=0 s.elemsize=0 s.state=_MSpanStack
 *(base+0) = 0xc421708ff0
 *(base+8) = 0xc420b57c00
 *(base+16) = 0x8d34a0
 *(base+24) = 0xc42385e727 <==
obj=0xc42385e700 k=0x6211c2f s.base()=0xc42385e000 s.limit=0xc423860000 s.sizeclass=5 s.elemsize=64 s.state=_MSpanInUse
 *(obj+0) = 0x636f6c2f7273752f
 *(obj+8) = 0x6c676f6f672f6c61
 *(obj+16) = 0x612f656d6f682f65
 *(obj+24) = 0x2f6e61766f6e6f64
 *(obj+32) = 0x632f6372732f6f67
 *(obj+40) = 0x68732f6f74707972
 *(obj+48) = 0x36353261
 *(obj+56) = 0x0
fatal error: checkmark found unmarked object

runtime stack:
runtime.throw(0x7672cc, 0x1f)
        /usr/local/google/home/adonovan/go/src/runtime/panic.go:596 +0x95 fp=0xc420219820 sp=0xc420219800
runtime.greyobject(0xc42385e700, 0xc422eb27b0, 0x18, 0xc41fe3d0c7, 0xc400000000, 0x7f1ecb86b558, 0xc42001c528, 0x1c)
        /usr/local/google/home/adonovan/go/src/runtime/mgcmark.go:1294 +0x30d fp=0xc420219850 sp=0xc420219820
runtime.scanblock(0xc422eb27b0, 0x30, 0x77805d, 0xc42001c528)
        /usr/local/google/home/adonovan/go/src/runtime/mgcmark.go:1141 +0x173 fp=0xc4202198c8 sp=0xc420219850
runtime.scanframeworker(0xc420219bd8, 0xc420219cc8, 0xc42001c528)
        /usr/local/google/home/adonovan/go/src/runtime/mgcmark.go:883 +0xfc fp=0xc420219958 sp=0xc4202198c8
runtime.scanstack.func1(0xc420219bd8, 0x0, 0xc420219a01)
        /usr/local/google/home/adonovan/go/src/runtime/mgcmark.go:777 +0x6e fp=0xc4202199a8 sp=0xc420219958
runtime.gentraceback(0xffffffffffffffff, 0xc422eb24f0, 0x0, 0xc4204044e0, 0x0, 0x0, 0x7fffffff, 0xc420219dc8, 0x0, 0x0, ...)
        /usr/local/google/home/adonovan/go/src/runtime/traceback.go:378 +0x1085 fp=0xc420219c38 sp=0xc4202199a8
runtime.scanstack(0xc4204044e0, 0xc42001c528)
        /usr/local/google/home/adonovan/go/src/runtime/mgcmark.go:800 +0x265 fp=0xc420219e18 sp=0xc420219c38
runtime.scang(0xc4204044e0, 0xc42001c528)
        /usr/local/google/home/adonovan/go/src/runtime/proc.go:853 +0x209 fp=0xc420219e50 sp=0xc420219e18
runtime.markroot.func1()
        /usr/local/google/home/adonovan/go/src/runtime/mgcmark.go:250 +0x6d fp=0xc420219e90 sp=0xc420219e50
runtime.systemstack(0xc420219ef8)
        /usr/local/google/home/adonovan/go/src/runtime/asm_amd64.s:330 +0xab fp=0xc420219e98 sp=0xc420219e90
runtime.markroot(0xc42001c528, 0xc00000037)
        /usr/local/google/home/adonovan/go/src/runtime/mgcmark.go:255 +0x2bb fp=0xc420219f20 sp=0xc420219e98
runtime.gcDrain(0xc42001c528, 0x0)
        /usr/local/google/home/adonovan/go/src/runtime/mgcmark.go:999 +0xad fp=0xc420219f50 sp=0xc420219f20
runtime.gcMark(0xecad4a7c8fbbe)
        /usr/local/google/home/adonovan/go/src/runtime/mgc.go:1641 +0x12a fp=0xc420219f98 sp=0xc420219f50
runtime.gcMarkTermination.func2()
        /usr/local/google/home/adonovan/go/src/runtime/mgc.go:1228 +0xe8 fp=0xc420219fb8 sp=0xc420219f98
runtime.systemstack(0xc420020000)
        /usr/local/google/home/adonovan/go/src/runtime/asm_amd64.s:314 +0x79 fp=0xc420219fc0 sp=0xc420219fb8
runtime.mstart()
        /usr/local/google/home/adonovan/go/src/runtime/proc.go:1104 fp=0xc420219fc8 sp=0xc420219fc0

[lots more...]
@aclements
Copy link
Member

More detail from the goroutine we're scanning:

runtime:greyobject: checkmarks finds unexpected unmarked object obj=0xc423d56030
runtime: found obj at *(0xc422e5e7b0+0x18)
base=0xc422e5e7b0 k=0x621172f s.base()=0xc422e5e000 s.limit=0x0 s.sizeclass=0 s.elemsize=0 s.state=_MSpanStack
 *(base+0) = 0xc4215336b0
 *(base+8) = 0xc4220dd600
 *(base+16) = 0x8bf1c0
 *(base+24) = 0xc423d56046 <==
obj=0xc423d56030 k=0x6211eab s.base()=0xc423d56000 s.limit=0xc423d57fe0 s.sizeclass=4 s.elemsize=48 s.state=_MSpanInUse
 *(obj+0) = 0x75612f656d6f682f
 *(obj+8) = 0x672f722f6e697473
 *(obj+16) = 0x6f672f6372732f6f
 *(obj+24) = 0x67726f2e676e616c
 *(obj+32) = 0x687475616f2f782f
 *(obj+40) = 0x6b63616c732f32
fatal error: checkmark found unmarked object

runtime stack:
runtime.throw(0x758f82, 0x1f)
        /home/austin/go.dev/src/runtime/panic.go:596 +0x95 fp=0xc42004b790 sp=0xc42004b770
runtime.greyobject(0xc423d56030, 0xc422e5e7b0, 0x18, 0xc41fe154fe, 0xc400000002, 0x7f551b7fb518, 0xc420030528, 0x1)
        /home/austin/go.dev/src/runtime/mgcmark.go:1294 +0x30d fp=0xc42004b7c0 sp=0xc42004b790
runtime.scanblock(0xc422e5e7b0, 0x30, 0x769aa9, 0xc420030528)
        /home/austin/go.dev/src/runtime/mgcmark.go:1141 +0x173 fp=0xc42004b838 sp=0xc42004b7c0
runtime.scanframeworker(0xc42004bb48, 0xc42004bc38, 0xc420030528)
        /home/austin/go.dev/src/runtime/mgcmark.go:883 +0xfc fp=0xc42004b8c8 sp=0xc42004b838
runtime.scanstack.func1(0xc42004bb48, 0x0, 0xc42004b901)
        /home/austin/go.dev/src/runtime/mgcmark.go:777 +0x6e fp=0xc42004b918 sp=0xc42004b8c8
runtime.gentraceback(0xffffffffffffffff, 0xc422e5e4f0, 0x0, 0xc420265380, 0x0, 0x0, 0x7fffffff, 0xc42004bd38, 0x0, 0x0, ...)
        /home/austin/go.dev/src/runtime/traceback.go:378 +0x1085 fp=0xc42004bba8 sp=0xc42004b918
runtime.scanstack(0xc420265380, 0xc420030528)
        /home/austin/go.dev/src/runtime/mgcmark.go:800 +0x265 fp=0xc42004bd88 sp=0xc42004bba8
runtime.scang(0xc420265380, 0xc420030528)
        /home/austin/go.dev/src/runtime/proc.go:853 +0x209 fp=0xc42004bdc0 sp=0xc42004bd88
...

goroutine 1855 [chan send (scan)]:
runtime.gopark(0x760e80, 0xc4220dd658, 0x74fc4f, 0x9, 0xc420265316, 0x3)
        /home/austin/go.dev/src/runtime/proc.go:261 +0x13a fp=0xc422e5e520 sp=0x
c422e5e4f0
runtime.goparkunlock(0xc4220dd658, 0x74fc4f, 0x9, 0x16, 0x3)
        /home/austin/go.dev/src/runtime/proc.go:267 +0x5e fp=0xc422e5e560 sp=0xc
422e5e520
runtime.chansend(0x6ddd00, 0xc4220dd600, 0xc422e5e6a8, 0xc422e5e601, 0x67a4bf, 0
xc4203e81c0)
        /home/austin/go.dev/src/runtime/chan.go:229 +0x2dd fp=0xc422e5e610 sp=0x
c422e5e560
runtime.chansend1(0x6ddd00, 0xc4220dd600, 0xc422e5e6a8)
        /home/austin/go.dev/src/runtime/chan.go:113 +0x4d fp=0xc422e5e650 sp=0xc
422e5e610
golang.org/x/tools/refactor/importgraph.Build.func1.1.1(0xc4215336b0, 0xc4220dd6
00, 0x8bf1c0, 0xc423d56046, 0x19, 0xc421a1bf80)
        /tmp/a/got/src/golang.org/x/tools/refactor/importgraph/graph.go:86 +0x8f fp=0xc422e5e7b0 sp=0xc422e5e650
runtime.goexit()
        /home/austin/go.dev/src/runtime/asm_amd64.s:2184 +0x1 fp=0xc422e5e7b8 sp=0xc422e5e7b0
created by golang.org/x/tools/refactor/importgraph.Build.func1.1
        /tmp/a/got/src/golang.org/x/tools/refactor/importgraph/graph.go:140 +0x182

Specifically, we're scanning 0x30 bytes worth of "arguments" of golang.org/x/tools/refactor/importgraph.Build.func1.1.1. This is weird, because that function takes no arguments and returns nothing, so I would expect its argument frame to be empty. But it turns out the compiler can turn certain captures into arguments in some situations, and that's what's happening here. If you look at the runtime.newproc that the go statement compiles to, we set up a "call frame" for this function containing all of its captures and pass that into the go statement.

One possibility is that we're screwing up the barriers on the stack-to-stack copy newproc does to set up the new frame. Another possibility is that the liveness map is wrong at this point.

It's also a bit weird that func1.1.1 has 0x30 bytes of argument frame, but is "called" by goexit, whose frame is always 8 bytes and doesn't have room for arguments. This is probably not an issue, since I think you'd see the same thing if these were genuine arguments to the go statement rather than captures.

@aclements aclements added this to the Go1.8 milestone Oct 31, 2016
@aclements aclements self-assigned this Oct 31, 2016
@aclements aclements added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 31, 2016
@aclements
Copy link
Member

Unsurprisingly, bisect implicates bd640c8. But that just says we're missing something on the stack that mark termination was previously patching up for us.

@aclements
Copy link
Member

More observations:

  • According to -live=2, the argument frame is
&wg(FP), type.*sync.WaitGroup(SB)
sema+8(FP), type.chan int(SB)
ctxt+16(FP), type.*go/build.Context(SB)
path+24(FP), type.string(SB)
ch+40(FP), type.chan interface {}(SB)
  • This is specifically the "path" object captured by closure. (It's clearly a string, and the panic shows that it's at offset 0x18 in the argument frame, which is the "path" capture.) There doesn't seem to be anything unusual about path. It becomes dead in the parent frame after the go statement, but that's also true of sema.
  • The argument map is "111101", which is correct.
  • Currently we only do write barriers in newproc when copying the argument frame if the source stack is grey. If I remove this condition so we also do the write barriers if the source stack is black, it no longer crashes. But this really shouldn't be necessary, so @RLH, @dr2chase and I are puzzling through what's going on here.

@aclements
Copy link
Member

Found it. The goroutine stack-to-stack copy was a red herring. The problem is that the pointer came over an unbuffered channel. While we have the appropriate stack-to-stack write barriers when the receiver is blocked and the sender is writing to the receiver's stack, we don't have the barriers when the sender is blocked and the receiver is reading from the sender's stack (since we didn't need them before).

I'll send a CL to fix this. I also need to check that select is doing the right thing.

@gopherbot
Copy link

CL https://golang.org/cl/32450 mentions this issue.

@aclements aclements added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 31, 2016
@golang golang locked and limited conversation to collaborators Oct 31, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

3 participants