-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: garbage collector found invalid heap pointer iterating over map #9872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This looks like a general bug in how our heap allocation works vs. how GC thinks it works. GC expects a single, contiguous heap, but we don't get that, at least on 32 bit (64 bit also?). The pointer in question is the buckhash pointer from runtime/mprof.go:144. Here are the sequence of SysAlloc allocations: alloc heap at [0x1016e000,0x1026e000] Note that the heap allocation at the end skipped over the buckhash allocation that happened early on (presumably because our requested size didn't fit adjacent to the last allocation because the buckhash allocation was in the way). We now have a "hole" in the heap, but the GC pointer checking code doesn't know that. It sees the pointer 0x301c0000 which is between arena_start=0x1016e000 and arena_used=0x433b0000 and barfs because there is no MSpan describing the 0x301c0000 address. We should allocate "out of heap" MSpans for the ranges which the sysAlloc calls skipped over. GC can ignore pointers to those areas. It's a bit complicated because maybe sysAlloc could come back and fill in holes, so the "out of heap" MSpans might need to be split. Or we could just turn off this check for 1.4.2. |
Note that this is a bad bug - basically any 32-bit program allocating more than 0x301c0000-0x1016e000 ~= 512MB and then triggering a GC will crash like this. package main import "runtime"
|
I don't think this can happen in this way on 64-bit, the heap is assumed to be contiguous and allocation fails if the runtime can't use the bits of the address space it thinks it can: https://github.com/golang/go/blob/master/src/runtime/mem_linux.go#L118 (and of course address space conflicts are way less likely on 64 bit anyway), |
The same check is disabled in tip, runtime/mbitmap.go:180. The test fails as expected if I enable it. And there's a nice comment right there: |
FWIW, The "Still happens sometimes" was referring to a 64-bit system. I don't believe it is this specific cause. |
… invalid span on 32 bit The 32-bit heap may have holes in it. Pointers to (non-heap) objects in those holes shouldn't cause the GC to throw. This change is somewhat of a band-aid fix for 1.4.2. We should do a more thorough fix for tip (keep track of the holes in the heap with special MSpans, say). Update #9872 Change-Id: Ife9ba27b77ae6ac5a6792d249c68893b3df62134 Reviewed-on: https://go-review.googlesource.com/4920 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
Same problem : go 1.4.2, os X 10.10.2 on a program that generate massive amount of data in a tree like (everything works just fine on go 1.3.3, all go 1.4 version fails) the panic occurs on a gc during a map allocation in the middle a big recursive generation (mem size ~10Gb) line 165 : mystructs[code] = make(map[int]*MyStruct) runtime: garbage collector found invalid heap pointer *(0x20dbc3f60+0x30)=0x208ca7000 span=0x208c9e000-0x208ca7000-0x208ca8000 state=0 runtime stack: |
Have you tested your program with the race detector? Can you make a reproducible example available?
|
i've got this with -race: limit on 8192 simultaneously alive goroutines is exceeded, dying |
Not currently. Are you able to reduce the amount of work your program does to stay under this limit? Are you able to post some code that demonstrates the problem?
|
it seems i found the problem : i restore my previous installation and it fails like before. it appears that external packages where build when i had a go 1.4 installed on my laptop and when i update go to 1.4.2, it does not warn me about it (like it does on a major version). if all externals are up to date, the problem does not appear anymore |
This issue appears similar to #9384, but does not require a struct and is still reproducible on some systems running 1.4.1 .
The following program panics on OS X 10.9.5 and 10.7.5 with go1.4.1 darwin/386:
Here is the output:
The panic occurs consistently on iteration 13631487 (CFFFFF). It does not matter what value is assigned to the map. Defining the map with smaller types of values (bool, int8) causes it to fail after precisely double the iterations; larger (complex128) fail after half as many. Cannot reproduce on OS X 10.10 or Redhat 6.3.
The text was updated successfully, but these errors were encountered: