Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: lock ordering problem involving 'fin' and 'mheap' #42062

Closed
bcmills opened this issue Oct 19, 2020 · 5 comments
Closed

runtime: lock ordering problem involving 'fin' and 'mheap' #42062

bcmills opened this issue Oct 19, 2020 · 5 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Oct 19, 2020

https://build.golang.org/log/e77e304008e6ffb4bf722e8296e6a2e7621cd4ba

136026  ======
0 : fin 18 0x63b350
1 : mheap 38 0x620080
fatal error: lock ordering problem
…
136026  ======
0 : fin 18 0x63b350
1 : mheap 38 0x620080
2 : allg 11 0x63b2d0
fatal error: lock ordering problem

CC @danscales @mknyszek

136026  ======
0 : fin 18 0x63b350
1 : mheap 38 0x620080
fatal error: lock ordering problem

runtime stack:
runtime.throw(0x53cdcf, 0x15)
	/workdir/go/src/runtime/panic.go:1112 +0x72
runtime.checkRanks(0xc000082180, 0x12, 0x26)
	/workdir/go/src/runtime/lockrank_on.go:159 +0x10c
runtime.lockWithRankMayAcquire.func1()
	/workdir/go/src/runtime/lockrank_on.go:231 +0xac
runtime.lockWithRankMayAcquire(0x620080, 0x26)
	/workdir/go/src/runtime/lockrank_on.go:220 +0x7e
runtime.getempty(0x40c70f)
	/workdir/go/src/runtime/mgcwork.go:359 +0xa5
runtime.(*gcWork).init(0xc000023e98)
	/workdir/go/src/runtime/mgcwork.go:99 +0x25
runtime.(*gcWork).putBatch(0xc000023e98, 0xc000023ed0, 0x2, 0x200)
	/workdir/go/src/runtime/mgcwork.go:173 +0x19f
runtime.wbBufFlush1(0xc000022800)
	/workdir/go/src/runtime/mwbbuf.go:287 +0x237
runtime.wbBufFlush.func1()
	/workdir/go/src/runtime/mwbbuf.go:201 +0x3a
runtime.systemstack(0x0)
	/workdir/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
	/workdir/go/src/runtime/proc.go:1146

goroutine 5 [running]:
runtime.systemstack_switch()
	/workdir/go/src/runtime/asm_amd64.s:330 fp=0xc00002e688 sp=0xc00002e680 pc=0x467040
runtime.wbBufFlush(0xc000080278, 0x543100)
	/workdir/go/src/runtime/mwbbuf.go:200 +0x47 fp=0xc00002e6a8 sp=0xc00002e688 pc=0x42fc47
runtime.gcWriteBarrier(0x437416, 0x542d10, 0xe000e000e, 0xc00002e7d0, 0x418b92, 0x543100, 0x63b350, 0xc000191410, 0x1, 0xc00002e7d0, ...)
	/workdir/go/src/runtime/asm_amd64.s:1463 +0xae fp=0xc00002e730 sp=0xc00002e6a8 pc=0x468e0e
runtime.gcWriteBarrierDX(0x542d10, 0xe000e000e, 0xc00002e7d0, 0x418b92, 0x543100, 0x63b350, 0xc000191410, 0x1, 0xc00002e7d0, 0x436f75, ...)
	/workdir/go/src/runtime/asm_amd64.s:1490 +0x7 fp=0xc00002e738 sp=0xc00002e730 pc=0x468e87
runtime.gopark(0x543100, 0x63b350, 0xc000191410, 0x1)
	/workdir/go/src/runtime/proc.go:325 +0x116 fp=0xc00002e758 sp=0xc00002e738 pc=0x437416
runtime.goparkunlock(...)
	/workdir/go/src/runtime/proc.go:337
runtime.runfinq()
	/workdir/go/src/runtime/mfinal.go:175 +0xb2 fp=0xc00002e7e0 sp=0xc00002e758 pc=0x418b92
runtime.goexit()
	/workdir/go/src/runtime/asm_amd64.s:1376 +0x1 fp=0xc00002e7e8 sp=0xc00002e7e0 pc=0x468d41
created by runtime.createfing
	/workdir/go/src/runtime/mfinal.go:156 +0x65
136026  ======
0 : fin 18 0x63b350
1 : mheap 38 0x620080
2 : allg 11 0x63b2d0
fatal error: lock ordering problem
panic during panic

runtime stack:
runtime.throw(0x53cdcf, 0x15)
	/workdir/go/src/runtime/panic.go:1112 +0x72
runtime.checkRanks(0xc000082180, 0x26, 0xb)
	/workdir/go/src/runtime/lockrank_on.go:159 +0x10c
runtime.lockWithRank.func1()
	/workdir/go/src/runtime/lockrank_on.go:87 +0xbc
runtime.lockWithRank(0x63b2d0, 0xb)
	/workdir/go/src/runtime/lockrank_on.go:76 +0xa5
runtime.lock(...)
	/workdir/go/src/runtime/lock_futex.go:47
runtime.tracebackothers(0xc000082180)
	/workdir/go/src/runtime/traceback.go:917 +0x93
runtime.dopanic_m(0xc000082180, 0x434752, 0xc000091de0, 0x1)
	/workdir/go/src/runtime/panic.go:1312 +0x2c9
runtime.fatalthrow.func1()
	/workdir/go/src/runtime/panic.go:1167 +0x5f
runtime.fatalthrow()
	/workdir/go/src/runtime/panic.go:1164 +0x57
runtime.throw(0x53cdcf, 0x15)
	/workdir/go/src/runtime/panic.go:1112 +0x72
runtime.checkRanks(0xc000082180, 0x12, 0x26)
	/workdir/go/src/runtime/lockrank_on.go:159 +0x10c
runtime.lockWithRankMayAcquire.func1()
	/workdir/go/src/runtime/lockrank_on.go:231 +0xac
runtime.lockWithRankMayAcquire(0x620080, 0x26)
	/workdir/go/src/runtime/lockrank_on.go:220 +0x7e
runtime.getempty(0x40c70f)
	/workdir/go/src/runtime/mgcwork.go:359 +0xa5
runtime.(*gcWork).init(0xc000023e98)
	/workdir/go/src/runtime/mgcwork.go:99 +0x25
runtime.(*gcWork).putBatch(0xc000023e98, 0xc000023ed0, 0x2, 0x200)
	/workdir/go/src/runtime/mgcwork.go:173 +0x19f
runtime.wbBufFlush1(0xc000022800)
	/workdir/go/src/runtime/mwbbuf.go:287 +0x237
runtime.wbBufFlush.func1()
	/workdir/go/src/runtime/mwbbuf.go:201 +0x3a
runtime.systemstack(0x0)
	/workdir/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
	/workdir/go/src/runtime/proc.go:1146
@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 19, 2020
@bcmills bcmills added this to the Go1.16 milestone Oct 19, 2020
@bcmills
Copy link
Contributor Author

bcmills commented Oct 19, 2020

Marking as release-blocker for Go 1.16 because it's a lock-ordering problem in the runtime.

(We should at least understand whether it's a missing lock annotation or a real potential deadlock prior to the 1.16 release.)

@mknyszek
Copy link
Contributor

mknyszek commented Oct 19, 2020

Ah, this is interesting.

I think what's happening here is:

  1. There's a partial-order edge missing between the fin and mheap locks (in terms of the total order, they're fine).
  2. When we try to throw while holding these locks, we try to grab allg which is not in total order with mheap.

Overall, I think benign? We can fix (1) fairly easily. (2) might be harder to fix, but this is a second-order effect that happens specifically only when crashing in the runtime. I don't think in practice (2) even hurts diagnostics for us because in the worst case it causes a deadlock, and eventually we'll try to dump everything much more forcefully without acquiring locks (e.g. someone sends the process a SIGABRT) since we'll freeze the world (or try and dump it all anyway... maybe this is only true with GOTRACEBACK=crash).

@mknyszek mknyszek self-assigned this Oct 19, 2020
@mknyszek mknyszek added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 19, 2020
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 19, 2020
@danscales
Copy link
Contributor

Yes the allg deadlock is not a problem, only happens because you've already crashed from something else.
Just need to fix the fin-mheap partial ordering.

@gopherbot
Copy link

Change https://golang.org/cl/263677 mentions this issue: runtime: add lock rank partial-order edge between fin and mheap

@gopherbot
Copy link

Change https://golang.org/cl/263678 mentions this issue: runtime: add lock rank partial-order edge between fin and mheap

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
None yet
Development

No branches or pull requests

4 participants