Skip to content

runtime: garbage collector found invalid heap pointer iterating over map #9872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bdeterling opened this issue Feb 14, 2015 · 10 comments
Closed

Comments

@bdeterling
Copy link

This issue appears similar to #9384, but does not require a struct and is still reproducible on some systems running 1.4.1 .

The following program panics on OS X 10.9.5 and 10.7.5 with go1.4.1 darwin/386:

package main

import ()

func main() {
    var m = make(map[int]int)
    for i := 0; i < 50000000; i++ {
        m[i] = 1
    }
}

Here is the output:

runtime: garbage collector found invalid heap pointer *(0x54c60+0x10)=0x301c0000 s=nil
fatal error: invalid heap pointer

runtime stack:
runtime.throw(0x53823)
    /usr/local/go/src/runtime/panic.go:491 +0x83 fp=0xbffffa68 sp=0xbffffa50
scanblock(0x54c60, 0x5980, 0x3017c070)
    /usr/local/go/src/runtime/mgc0.c:378 +0x487 fp=0xbffffb08 sp=0xbffffa68
markroot(0x1017e000, 0x1)
    /usr/local/go/src/runtime/mgc0.c:496 +0x133 fp=0xbffffb40 sp=0xbffffb08
runtime.parfordo(0x1017e000)
    /usr/local/go/src/runtime/parfor.c:76 +0x93 fp=0xbffffb9c sp=0xbffffb40
gc(0xbffffcd8)
    /usr/local/go/src/runtime/mgc0.c:1439 +0x1ef fp=0xbffffcc8 sp=0xbffffb9c
runtime.gc_m()
    /usr/local/go/src/runtime/mgc0.c:1368 +0xb4 fp=0xbffffce8 sp=0xbffffcc8
runtime.onM(0x54ef8)
    /usr/local/go/src/runtime/asm_386.s:266 +0x4b fp=0xbffffcec sp=0xbffffce8
runtime.mstart()
    /usr/local/go/src/runtime/proc.c:818 fp=0xbffffcf0 sp=0xbffffcec

goroutine 1 [garbage collection]:
runtime.switchtoM()
    /usr/local/go/src/runtime/asm_386.s:208 fp=0x10189694 sp=0x10189690
runtime.gogc(0x0)
    /usr/local/go/src/runtime/malloc.go:469 +0x1a4 fp=0x101896b4 sp=0x10189694
runtime.mallocgc(0x13000000, 0x2ed60, 0x0, 0x55100)
    /usr/local/go/src/runtime/malloc.go:341 +0x2be fp=0x1018970c sp=0x101896b4
runtime.newarray(0x2ed60, 0x400000, 0x5c21)
    /usr/local/go/src/runtime/malloc.go:365 +0xaf fp=0x1018972c sp=0x1018970c
runtime.hashGrow(0x2d3e0, 0x101a4020)
    /usr/local/go/src/runtime/hashmap.go:744 +0x69 fp=0x10189744 sp=0x1018972c
runtime.mapassign1(0x2d3e0, 0x101a4020, 0x101897bc, 0x101897b8)
    /usr/local/go/src/runtime/hashmap.go:456 +0x425 fp=0x101897a0 sp=0x10189744
main.main()
    /Users/brian/gocode/src/github.com/bdeterling/memtest/main/main.go:10 +0x88 fp=0x101897cc sp=0x101897a0
runtime.main()
    /usr/local/go/src/runtime/proc.go:63 +0xc6 fp=0x101897f0 sp=0x101897cc
runtime.goexit()
    /usr/local/go/src/runtime/asm_386.s:2287 +0x1 fp=0x101897f4 sp=0x101897f0

goroutine 2 [force gc (idle)]:
runtime.gopark(0x20630, 0x54d20, 0x37488, 0xf)
    /usr/local/go/src/runtime/proc.go:130 +0xd8 fp=0x101867cc sp=0x101867b4
runtime.goparkunlock(0x54d20, 0x37488, 0xf)
    /usr/local/go/src/runtime/proc.go:136 +0x3c fp=0x101867e0 sp=0x101867cc
runtime.forcegchelper()
    /usr/local/go/src/runtime/proc.go:99 +0x9d fp=0x101867f0 sp=0x101867e0
runtime.goexit()
    /usr/local/go/src/runtime/asm_386.s:2287 +0x1 fp=0x101867f4 sp=0x101867f0
created by runtime.init·4
    /usr/local/go/src/runtime/proc.go:87 +0x1f

goroutine 3 [GC sweep wait]:
runtime.gopark(0x20630, 0x5a640, 0x37148, 0xd)
    /usr/local/go/src/runtime/proc.go:130 +0xd8 fp=0x10189fcc sp=0x10189fb4
runtime.goparkunlock(0x5a640, 0x37148, 0xd)
    /usr/local/go/src/runtime/proc.go:136 +0x3c fp=0x10189fe0 sp=0x10189fcc
runtime.bgsweep()
    /usr/local/go/src/runtime/mgc0.go:98 +0x91 fp=0x10189ff0 sp=0x10189fe0
runtime.goexit()
    /usr/local/go/src/runtime/asm_386.s:2287 +0x1 fp=0x10189ff4 sp=0x10189ff0
created by gc
    /usr/local/go/src/runtime/mgc0.c:1383

The panic occurs consistently on iteration 13631487 (CFFFFF). It does not matter what value is assigned to the map. Defining the map with smaller types of values (bool, int8) causes it to fail after precisely double the iterations; larger (complex128) fail after half as many. Cannot reproduce on OS X 10.10 or Redhat 6.3.

@randall77
Copy link
Contributor

This looks like a general bug in how our heap allocation works vs. how GC thinks it works. GC expects a single, contiguous heap, but we don't get that, at least on 32 bit (64 bit also?). The pointer in question is the buckhash pointer from runtime/mprof.go:144.

Here are the sequence of SysAlloc allocations:

alloc heap at [0x1016e000,0x1026e000]
buckhash @ 0x301c0000
alloc heap at [0x1026e000,0x1036e000]
alloc heap at [0x1036e000,0x1049e000]
alloc heap at [0x1049e000,0x106fe000]
alloc heap at [0x106fe000,0x10bbe000]
alloc heap at [0x10bbe000,0x1153e000]
alloc heap at [0x1153e000,0x1283e000]
alloc heap at [0x1283e000,0x14e3e000]
alloc heap at [0x14e3e000,0x19a3e000]
alloc heap at [0x19a3e000,0x2323e000]
alloc heap at [0x303b0000,0x433b0000]

Note that the heap allocation at the end skipped over the buckhash allocation that happened early on (presumably because our requested size didn't fit adjacent to the last allocation because the buckhash allocation was in the way). We now have a "hole" in the heap, but the GC pointer checking code doesn't know that. It sees the pointer 0x301c0000 which is between arena_start=0x1016e000 and arena_used=0x433b0000 and barfs because there is no MSpan describing the 0x301c0000 address.

We should allocate "out of heap" MSpans for the ranges which the sysAlloc calls skipped over. GC can ignore pointers to those areas. It's a bit complicated because maybe sysAlloc could come back and fill in holes, so the "out of heap" MSpans might need to be split.

Or we could just turn off this check for 1.4.2.

@randall77
Copy link
Contributor

Note that this is a bad bug - basically any 32-bit program allocating more than 0x301c0000-0x1016e000 ~= 512MB and then triggering a GC will crash like this.

package main

import "runtime"

func main() {
    var b [][]byte

    for i := 0; i < 600; i++ {
        b = append(b, make([]byte, 1048576))
    }
    runtime.GC()
}

@RLH @rsc @aclements

@mwhudson
Copy link
Contributor

I don't think this can happen in this way on 64-bit, the heap is assumed to be contiguous and allocation fails if the runtime can't use the bits of the address space it thinks it can: https://github.com/golang/go/blob/master/src/runtime/mem_linux.go#L118 (and of course address space conflicts are way less likely on 64 bit anyway),

@randall77
Copy link
Contributor

The same check is disabled in tip, runtime/mbitmap.go:180. The test fails as expected if I enable it.

And there's a nice comment right there:
// Still happens sometimes. We don't know why.

@rsc
Copy link
Contributor

rsc commented Feb 17, 2015

FWIW, The "Still happens sometimes" was referring to a 64-bit system. I don't believe it is this specific cause.

adg pushed a commit that referenced this issue Feb 17, 2015
… invalid span on 32 bit

The 32-bit heap may have holes in it.  Pointers to (non-heap) objects
in those holes shouldn't cause the GC to throw.

This change is somewhat of a band-aid fix for 1.4.2.  We should do
a more thorough fix for tip (keep track of the holes in the heap
with special MSpans, say).

Update #9872

Change-Id: Ife9ba27b77ae6ac5a6792d249c68893b3df62134
Reviewed-on: https://go-review.googlesource.com/4920
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@nprudhomme
Copy link

Same problem : go 1.4.2, os X 10.10.2 on a program that generate massive amount of data in a tree like (everything works just fine on go 1.3.3, all go 1.4 version fails)
it is mostly random, sometimes everything goes fine (~10% of the time)

the panic occurs on a gc during a map allocation in the middle a big recursive generation (mem size ~10Gb)

line 165 : mystructs[code] = make(map[int]*MyStruct)

runtime: garbage collector found invalid heap pointer *(0x20dbc3f60+0x30)=0x208ca7000 span=0x208c9e000-0x208ca7000-0x208ca8000 state=0
fatal error: invalid heap pointer

runtime stack:
runtime.throw(0x2b1b63)
/usr/local/go/src/runtime/panic.go:491 +0xad fp=0x2083adb08 sp=0x2083adad8
scanblock(0x20dbc3f60, 0x78, 0x219060)
/usr/local/go/src/runtime/mgc0.c:378 +0x551 fp=0x2083adc48 sp=0x2083adb08
scanframe(0x2083add50, 0x0, 0x1)
/usr/local/go/src/runtime/mgc0.c:719 +0x164 fp=0x2083adcb8 sp=0x2083adc48
runtime.gentraceback(0x40725, 0x20dbc3e10, 0x0, 0x209745e60, 0x0, 0x0, 0x7fffffff, 0x2083ade00, 0x0, 0x0, ...)
/usr/local/go/src/runtime/traceback.go:311 +0x7a8 fp=0x2083adda8 sp=0x2083adcb8
scanstack(0x209745e60)
/usr/local/go/src/runtime/mgc0.c:777 +0x21c fp=0x2083ade18 sp=0x2083adda8
markroot(0x208324000, 0x6245)
/usr/local/go/src/runtime/mgc0.c:553 +0xe7 fp=0x2083ade78 sp=0x2083ade18
runtime.parfordo(0x208324000)
/usr/local/go/src/runtime/parfor.c:91 +0x13b fp=0x2083adef8 sp=0x2083ade78
runtime.gchelper()
/usr/local/go/src/runtime/mgc0.c:1185 +0x4a fp=0x2083adf20 sp=0x2083adef8
stopm()
/usr/local/go/src/runtime/proc.c:1181 +0x158 fp=0x2083adf40 sp=0x2083adf20
findrunnable(0x208327200)
/usr/local/go/src/runtime/proc.c:1487 +0x562 fp=0x2083adf78 sp=0x2083adf40
schedule()
/usr/local/go/src/runtime/proc.c:1575 +0x151 fp=0x2083adfa8 sp=0x2083adf78
runtime.park_m(0x208314120)
/usr/local/go/src/runtime/proc.c:1654 +0x113 fp=0x2083adfd0 sp=0x2083adfa8
runtime.mcall(0x58e34)
/usr/local/go/src/runtime/asm_amd64.s:186 +0x5a fp=0x2083adfe0 sp=0x2083adfd0
goroutine 1 [garbage collection]:
runtime.switchtoM()
/usr/local/go/src/runtime/asm_amd64.s:198 fp=0x20ba3ef60 sp=0x20ba3ef58
runtime.gogc(0x200000000)
/usr/local/go/src/runtime/malloc.go:469 +0x1cf fp=0x20ba3ef98 sp=0x20ba3ef60
runtime.mallocgc(0x30, 0x193820, 0x0, 0x20ba3f0d0)
/usr/local/go/src/runtime/malloc.go:341 +0x391 fp=0x20ba3f048 sp=0x20ba3ef98
runtime.newobject(0x193820, 0x23768c8fccc17701)
/usr/local/go/src/runtime/malloc.go:353 +0x49 fp=0x20ba3f070 sp=0x20ba3f048
runtime.makemap(0x151c20, 0x0, 0x208616c6c)
/usr/local/go/src/runtime/hashmap.go:220 +0x36e fp=0x20ba3f0c0 sp=0x20ba3f070
main.searchTenant(0x1, 0x208d4abb0, 0x20859ea00, 0x208c06870, 0x3, 0x2208344b40, 0x20eedb460, 0x20eedb460)
/Users/np/tools/go/src/tenant/tenanttype.go:165 +0xc49 fp=0x20ba3f398 sp=0x20ba3f0c0
[....]

@davecheney
Copy link
Contributor

Have you tested your program with the race detector?

Can you make a reproducible example available?

On 24 Feb 2015, at 19:07, nprudhomme notifications@github.com wrote:

Same problem : go 1.4.2, os X 10.10.2 on a program that generate massive amount of data in a tree like (everything works just fine on go 1.3.3, all go 1.4 version fails)
it is mostly random, sometimes everything goes fine (~10% of the time)

the panic occurs on a gc during a map allocation in the middle a big recursive generation (mem size ~10Gb)

line 165 : mystructs[code] = make(map[int]*MyStruct)

runtime: garbage collector found invalid heap pointer *(0x20dbc3f60+0x30)=0x208ca7000 span=0x208c9e000-0x208ca7000-0x208ca8000 state=0
fatal error: invalid heap pointer

runtime stack:
runtime.throw(0x2b1b63)
/usr/local/go/src/runtime/panic.go:491 +0xad fp=0x2083adb08 sp=0x2083adad8
scanblock(0x20dbc3f60, 0x78, 0x219060)
/usr/local/go/src/runtime/mgc0.c:378 +0x551 fp=0x2083adc48 sp=0x2083adb08
scanframe(0x2083add50, 0x0, 0x1)
/usr/local/go/src/runtime/mgc0.c:719 +0x164 fp=0x2083adcb8 sp=0x2083adc48
runtime.gentraceback(0x40725, 0x20dbc3e10, 0x0, 0x209745e60, 0x0, 0x0, 0x7fffffff, 0x2083ade00, 0x0, 0x0, ...)
/usr/local/go/src/runtime/traceback.go:311 +0x7a8 fp=0x2083adda8 sp=0x2083adcb8
scanstack(0x209745e60)
/usr/local/go/src/runtime/mgc0.c:777 +0x21c fp=0x2083ade18 sp=0x2083adda8
markroot(0x208324000, 0x6245)
/usr/local/go/src/runtime/mgc0.c:553 +0xe7 fp=0x2083ade78 sp=0x2083ade18
runtime.parfordo(0x208324000)
/usr/local/go/src/runtime/parfor.c:91 +0x13b fp=0x2083adef8 sp=0x2083ade78
runtime.gchelper()
/usr/local/go/src/runtime/mgc0.c:1185 +0x4a fp=0x2083adf20 sp=0x2083adef8
stopm()
/usr/local/go/src/runtime/proc.c:1181 +0x158 fp=0x2083adf40 sp=0x2083adf20
findrunnable(0x208327200)
/usr/local/go/src/runtime/proc.c:1487 +0x562 fp=0x2083adf78 sp=0x2083adf40
schedule()
/usr/local/go/src/runtime/proc.c:1575 +0x151 fp=0x2083adfa8 sp=0x2083adf78
runtime.park_m(0x208314120)
/usr/local/go/src/runtime/proc.c:1654 +0x113 fp=0x2083adfd0 sp=0x2083adfa8
runtime.mcall(0x58e34)
/usr/local/go/src/runtime/asm_amd64.s:186 +0x5a fp=0x2083adfe0 sp=0x2083adfd0
goroutine 1 [garbage collection]:
runtime.switchtoM()
/usr/local/go/src/runtime/asm_amd64.s:198 fp=0x20ba3ef60 sp=0x20ba3ef58
runtime.gogc(0x200000000)
/usr/local/go/src/runtime/malloc.go:469 +0x1cf fp=0x20ba3ef98 sp=0x20ba3ef60
runtime.mallocgc(0x30, 0x193820, 0x0, 0x20ba3f0d0)
/usr/local/go/src/runtime/malloc.go:341 +0x391 fp=0x20ba3f048 sp=0x20ba3ef98
runtime.newobject(0x193820, 0x23768c8fccc17701)
/usr/local/go/src/runtime/malloc.go:353 +0x49 fp=0x20ba3f070 sp=0x20ba3f048
runtime.makemap(0x151c20, 0x0, 0x208616c6c)
/usr/local/go/src/runtime/hashmap.go:220 +0x36e fp=0x20ba3f0c0 sp=0x20ba3f070
main.searchTenant(0x1, 0x208d4abb0, 0x20859ea00, 0x208c06870, 0x3, 0x2208344b40, 0x20eedb460, 0x20eedb460)
/Users/np/tools/go/src/tenant/tenanttype.go:165 +0xc49 fp=0x20ba3f398 sp=0x20ba3f0c0
[....]


Reply to this email directly or view it on GitHub.

@nprudhomme
Copy link

i've got this with -race: limit on 8192 simultaneously alive goroutines is exceeded, dying
is this limit configurable ?

@davecheney
Copy link
Contributor

Not currently. Are you able to reduce the amount of work your program does to stay under this limit? Are you able to post some code that demonstrates the problem?

On 24 Feb 2015, at 21:26, nprudhomme notifications@github.com wrote:

i've got this with -race: limit on 8192 simultaneously alive goroutines is exceeded, dying
is this limit configurable ?


Reply to this email directly or view it on GitHub.

@nprudhomme
Copy link

it seems i found the problem :
i reinstalled go 1.4.2 (on my go 1.3.3)
recompile all external packages (like github.com/deckarep/golang-set) for go1.4.2 and it works

i restore my previous installation and it fails like before. it appears that external packages where build when i had a go 1.4 installed on my laptop and when i update go to 1.4.2, it does not warn me about it (like it does on a major version). if all externals are up to date, the problem does not appear anymore

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants