Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "non in-use span in unswept list" flake on darwin-amd64 #22987

Closed
mdempsky opened this issue Dec 4, 2017 · 5 comments
Closed

runtime: "non in-use span in unswept list" flake on darwin-amd64 #22987

mdempsky opened this issue Dec 4, 2017 · 5 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mdempsky
Copy link
Member

mdempsky commented Dec 4, 2017

https://go-review.googlesource.com/c/go/+/81775 flaked on darwin-amd64 due to:

# cmd/compile (testmain)
runtime: bad span s.state=0 s.sweepgen=564617216 sweepgen=12
fatal error: non in-use span in unswept list

https://storage.googleapis.com/go-build-log/5cf4a738/darwin-amd64-10_11_e7ffe203.log

/cc @aclements

@bradfitz bradfitz added this to the Go1.10 milestone Dec 4, 2017
@bradfitz bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label Dec 4, 2017
@mdempsky
Copy link
Member Author

mdempsky commented Dec 4, 2017

That sweepgen value looks super bogus. I'm thinking it must be software memory corruption. (Hardware memory corruption would probably be just 1-bit off, rather than a completely different value.)

There was also #22988, which was also a darwin-amd64 flake. This makes me suspect there's something darwin-amd64-specific going on.

@gopherbot
Copy link

Change https://golang.org/cl/83016 mentions this issue: runtime: reset write barrier buffer on all flush paths

@gopherbot
Copy link

Change https://golang.org/cl/83015 mentions this issue: runtime: mark heapBits.bits nosplit

gopherbot pushed a commit that referenced this issue Dec 11, 2017
heapBits.bits is used during bulkBarrierPreWrite via
heapBits.isPointer, which means it must not be preempted. If it is
preempted, several bad things can happen:

1. This could allow a GC phase change, and the resulting shear between
the barriers and the memory writes could result in a lost pointer.

2. Since bulkBarrierPreWrite uses the P's local write barrier buffer,
if it also migrates to a different P, it could try to append to the
write barrier buffer concurrently with another write barrier. This can
result in the buffer's next pointer skipping over its end pointer,
which results in a buffer overflow that can corrupt arbitrary other
fields in the Ps (or anything in the heap, really, but it'll probably
crash from the corrupted P quickly).

Fix this by marking heapBits.bits go:nosplit. This would be the
perfect use for a recursive no-preempt annotation (#21314).

This doesn't actually affect any binaries because this function was
always inlined anyway. (I discovered it when I was modifying heapBits
and make h.bits() no longer inline, which led to rampant crashes from
problem 2 above.)

Updates #22987 and #22988 (but doesn't fix because it doesn't actually
change the generated code).

Change-Id: I60ebb928b1233b0613361ac3d0558d7b1cb65610
Reviewed-on: https://go-review.googlesource.com/83015
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue Dec 11, 2017
Currently, wbBufFlush does nothing if the goroutine is dying on the
assumption that the system is crashing anyway and running the write
barrier may crash it even more. However, it fails to reset the
buffer's "next" pointer. As a result, if there are later write
barriers on the same P, the write barrier will overflow the write
barrier buffer and start corrupting other fields in the P or other
heap objects. Often, this corrupts fields in the next allocated P
since they tend to be together in the heap.

Fix this by always resetting the buffer's "next" pointer, even if
we're not doing anything with the pointers in the buffer.

Updates #22987 and #22988. (May fix; it's hard to say.)

Change-Id: I82c11ea2d399e1658531c3e8065445a66b7282b2
Reviewed-on: https://go-review.googlesource.com/83016
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@ianlancetaylor
Copy link
Contributor

@aclements Do you think this might be fixed, or should we kick it to 1.11?

@ianlancetaylor
Copy link
Contributor

Actually I'm going to close this as a likely dup of #22988. (The same questions arise on that issue: close or kick?)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants