Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: efence crash with long-running programs #7448

Closed
josharian opened this issue Mar 3, 2014 · 15 comments
Closed

runtime: efence crash with long-running programs #7448

josharian opened this issue Mar 3, 2014 · 15 comments
Milestone

Comments

@josharian
Copy link
Contributor

What steps will reproduce the problem?

Using 8g, run http://play.golang.org/p/-tP949AItw with efence=1:

GODEBUG=efence=1 go run loop.go


What is the expected output? What do you see instead?

Want: Run until I get bored.

Got: Crash within moments.

sweep 0 48
fatal error: gc: unswept span

runtime stack:
runtime.throw(0x9ef58)
    .../src/pkg/runtime/panic.c:464 +0x5f
markroot(0x101f2000, 0x3)
    .../src/pkg/runtime/mgc0.c:1297 +0x24f
runtime.parfordo(0x101f2000)
    .../src/pkg/runtime/parfor.c:88 +0x99
gc(0x302c7e60)
    .../src/pkg/runtime/mgc0.c:2323 +0x169
mgc(0x101da000)
    .../src/pkg/runtime/mgc0.c:2273 +0x2b
runtime.mcall(0x2199f)
    .../src/pkg/runtime/asm_386.s:190 +0x3c

goroutine 16 [garbage collection]:
runtime.gc(0x0)
    .../src/pkg/runtime/mgc0.c:2242 +0x196 fp=0x302c7e6c
runtime.mallocgc(0x2000, 0x0, 0x0)
    .../src/pkg/runtime/malloc.goc:206 +0x186 fp=0x302c7ea0
runtime.mal(0x28)
    .../src/pkg/runtime/malloc.goc:774 +0x38 fp=0x302c7eb0
reflect.mapiterinit(0x40420, 0x0, 0x0)
    .../src/pkg/runtime/hashmap.goc:1027 +0x29 fp=0x302c7ec0
reflect.Value.MapKeys(0x40420, 0x0, 0x0, 0x150, 0x0, ...)
    .../src/pkg/reflect/value.go:1222 +0xeb fp=0x302c7f58
main.main()
    .../test/mk.go:8 +0x69 fp=0x302c7f98
runtime.main()
    .../src/pkg/runtime/proc.c:245 +0xfa fp=0x302c7fcc
runtime.goexit()
    .../src/pkg/runtime/proc.c:1452 fp=0x302c7fd0
created by _rt0_go
    .../src/pkg/runtime/asm_386.s:101 +0xf7
exit status 2

Please use labels and text to provide additional information.

I think (but am not 100% sure) that this is just a problem with the efence
implementation, not a deeper GC/reflect problem. I think that this happens because the
allocator hands out the same address twice, not knowing that the backing memory has been
returned to the OS in the meantime. Dmitry pointed out this possibility in the original
efence CL review (https://golang.org/cl/22060046/).
@rsc
Copy link
Contributor

rsc commented Mar 3, 2014

Comment 1:

Labels changed: added release-go1.3.

@dvyukov
Copy link
Member

dvyukov commented Mar 6, 2014

Comment 2:

what os/arch?
does it still happen for you on tip?
on linux/amd64, on 19380:32f0dc88f804, it crashes with (near the time I start being
bored):
runtime: out of memory: cannot allocate 131072-byte block (137438953472 in use)
fatal error: out of memory
which looks like WorkingAsIntended

Status changed to WaitingForReply.

@dvyukov
Copy link
Member

dvyukov commented Mar 6, 2014

Comment 3:

Labels changed: added repo-main.

@josharian
Copy link
Contributor Author

Comment 4:

With darwin/386 tip (go version devel +32f0dc88f804 Thu Mar 06 13:16:14 2014 +0400
darwin/386), it reproduces exactly as above, with "gc: unswept span".
With linux/386 tip (go version devel +32f0dc88f804 Thu Mar 06 13:16:14 2014 +0400
linux/386), it dies almost immediately with "runtime: out of memory":
runtime: memory allocated by OS (0xa6f92000) not in usable range [0x18300000,0x98300000)
runtime: memory allocated by OS (0xb72eb000) not in usable range [0x18300000,0x98300000)
runtime: out of memory: cannot allocate 131072-byte block (536870912 in use)
fatal error: out of memory
With darwin/amd64, I get bored before it runs out of memory.
So: darwin/386 only. Sorry for only specifying "8g".

@dvyukov
Copy link
Member

dvyukov commented Mar 6, 2014

Comment 5:

I can reproduce it on
go version devel +7a45730704af Thu Mar 06 18:44:14 2014 +0400 darwin/386

Owner changed to @dvyukov.

Status changed to Accepted.

@dvyukov
Copy link
Member

dvyukov commented Mar 6, 2014

Comment 6:

This can be https://golang.org/issue/7159 i.e. general heap
corruption
Because I also see:
unexpected fault address 0x20f60000
fatal error: fault
[signal 0xa code=0x2 addr=0x20f60000 pc=0x1ab5d]
goroutine 16 [running]:
runtime.throw(0x9de7c)
    src/pkg/runtime/panic.c:464 +0x5f fp=0x302c7e90
runtime.sigpanic()
    src/pkg/runtime/os_darwin.c:445 +0x1e4 fp=0x302c7e9c
runtime.mapiterinit(0x40580, 0x0, 0x20f60000)
    src/pkg/runtime/hashmap.goc:1008 +0x3d fp=0x302c7eb0
reflect.mapiterinit(0x40580, 0x0, 0x20f60000)
    src/pkg/runtime/hashmap.goc:1028 +0x49 fp=0x302c7ec0
reflect.Value.MapKeys(0x40580, 0x0, 0x0, 0x150, 0x0, ...)
    src/pkg/reflect/value.go:1222 +0xeb fp=0x302c7f58
main.main()
    /tmp/111.go:8 +0x69 fp=0x302c7f98
runtime.main()
    src/pkg/runtime/proc.c:245 +0xfa fp=0x302c7fcc
runtime.goexit()
    src/pkg/runtime/proc.c:1444 fp=0x302c7fd0
created by _rt0_go
    src/pkg/runtime/asm_386.s:101 +0xf7

@dvyukov
Copy link
Member

dvyukov commented Mar 6, 2014

Comment 7:

Here is what happens.
Darwin uses
        p = runtime·mmap(v, n, PROT_READ|PROT_WRITE, MAP_ANON|MAP_FIXED|MAP_PRIVATE, -1, 0);
to map heap memory. MAP_FIXED happily overwrites any existing mappings.
efence significantly increases heap size (as virtual addresses are not reused).
As the result, in small 32-bit address space, heap eventually overlaps with other
allocations (e.g. MSpan), overwrites them and corrupts.
So I would say that this is more efence-specific. But strictly saying nothing prevents
it from happening w/o efence (esp if you have very large heap).
It's not the time to refactor it for 1.3. Moving to 1.4.
Russ, do you agree?

Labels changed: added release-go1.4, removed release-go1.3.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 6, 2014

Comment 8:

In lieu of refactoring and fixing, can we at least detect this condition in Go 1.3 and
crash with a "sorry efence is broken", rather than just silently corrupt things and
crash later?

@rsc
Copy link
Contributor

rsc commented Mar 6, 2014

Comment 9:

This needs to be fixed one way or another for Go 1.3, *especially* if it can happen
without efence. Maybe this is what is causing the run.go failures on darwin/386.

Labels changed: added release-go1.3, removed release-go1.4.

@rsc
Copy link
Contributor

rsc commented Mar 6, 2014

Comment 10:

I am seeing efence failures with only 200 MB of allocated memory in the heap. (That's
HeapSys, so it should be accurate even in efence mode.) That seems too small to be
explained by this heap overlap problem, unless the problem is much more common than it
might seem.

@josharian
Copy link
Contributor Author

Comment 11:

If you run with GODEBUG=efence=1,allocfreetrace=1, there's a pretty clear
MProf_Malloc(<addr>), MProf_Free(<addr>), MProf_Malloc(<addr>), crash
associated with <addr> pattern.

@rsc
Copy link
Contributor

rsc commented Mar 6, 2014

Comment 12:

Dmitriy and I worked on this for a while. There are at least two different bugs here. CL
71750048 fixes them. efence will no longer make it look like your program is at fault,
but it will not run very long either: it will print that the program is out of memory
and exit.

Owner changed to @rsc.

Status changed to Started.

@rsc
Copy link
Contributor

rsc commented Mar 6, 2014

Comment 13:

This issue was closed by revision da1bea0.

Status changed to Fixed.

@mikioh
Copy link
Contributor

mikioh commented Mar 7, 2014

Comment 14:

Issue #7461 has been merged into this issue.

@MichaelTJones
Copy link
Contributor

Comment 15:

Closed or not, I have it as of today:
mtj-macbookpro:hcn11 mtj$ time ./hcn11 10**500 10**800 | grep "14 8 5 4 3^2 2^13"
sweep 0 1906
fatal error: gc: unswept span
runtime stack:
runtime.throw(0x1ca008)
    /Users/mtj/go/src/pkg/runtime/panic.c:522 +0x77 fp=0x20832de08 sp=0x20832ddf0
markroot(0x208225000, 0x3)
    /Users/mtj/go/src/pkg/runtime/mgc0.c:533 +0x21a fp=0x20832de70 sp=0x20832de08
runtime.parfordo(0x208225000)
    /Users/mtj/go/src/pkg/runtime/parfor.c:103 +0x128 fp=0x20832dee8 sp=0x20832de70
runtime.gchelper()
    /Users/mtj/go/src/pkg/runtime/mgc0.c:1135 +0x46 fp=0x20832df10 sp=0x20832dee8
stopm()
    /Users/mtj/go/src/pkg/runtime/proc.c:973 +0x149 fp=0x20832df28 sp=0x20832df10
gcstopm()
    /Users/mtj/go/src/pkg/runtime/proc.c:1138 +0xd2 fp=0x20832df48 sp=0x20832df28
schedule()
    /Users/mtj/go/src/pkg/runtime/proc.c:1342 +0x93 fp=0x20832df70 sp=0x20832df48
park0(0x20d866000)
    /Users/mtj/go/src/pkg/runtime/proc.c:1444 +0xe7 fp=0x20832df90 sp=0x20832df70
runtime.mcall(0x36720)
    /Users/mtj/go/src/pkg/runtime/asm_amd64.s:183 +0x52 fp=0x20832dfa0 sp=0x20832df90

Attachments:

  1. death.txt (92076 bytes)

@rsc rsc added this to the Go1.3 milestone Apr 14, 2015
@rsc rsc removed the release-go1.3 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
@rsc rsc removed their assignment Jun 23, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants