Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: sigpanic during GC on android/arm64 #22204

Closed
tmm1 opened this issue Oct 10, 2017 · 6 comments
Closed

runtime: sigpanic during GC on android/arm64 #22204

tmm1 opened this issue Oct 10, 2017 · 6 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@tmm1
Copy link
Contributor

tmm1 commented Oct 10, 2017

What version of Go are you using (go version)?

go1.9.1

What operating system and processor architecture are you using (go env)?

android/arm64

What did you do?

I have a large golang server application that uses a variety of libraries, including some that use cgo. The app runs on many different platforms and is deployed on hundreds of different hardware and operating systems combinations.

On android/arm64 specifically, I see periodic crashes in the golang runtime.

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x4000017698 pc=0x20285915f8]

runtime stack:
runtime.throw(0x2028bda477, 0x2a)
	go/src/runtime/panic.go:605 +0x70
runtime.sigpanic()
	go/src/runtime/signal_unix.go:351 +0x264
runtime.inheap(...)
	go/src/runtime/mheap.go:377
runtime.gcmarkwb_m(0x442052bf20, 0x4425da67f0)
	go/src/runtime/mbarrier.go:163 +0xa0
runtime.writebarrierptr_prewrite1.func1()
	go/src/runtime/mbarrier.go:193 +0x54
runtime.systemstack(0x4420524f38)
	go/src/runtime/asm_arm64.s:241 +0x8c
runtime.mstart()
	go/src/runtime/proc.go:1125
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x4000006578 pc=0x203b7dd330]

runtime stack:
runtime.throw(0x203be25477, 0x2a)
	go/src/runtime/panic.go:605 +0x70
runtime.sigpanic()
	go/src/runtime/signal_unix.go:351 +0x264
runtime.heapBitsForObject(0x442195e660, 0x442195b770, 0x0, 0x441ff361cb, 0x4400000000, 0x204da725f0, 0x442001d260, 0x68)
	go/src/runtime/mbitmap.go:392 +0x88
runtime.scanobject(0x442195b770, 0x442001d260)
	go/src/runtime/mgcmark.go:1187 +0x218
runtime.gcDrain(0x442001d260, 0xd)
	go/src/runtime/mgcmark.go:943 +0x1dc
runtime.gcBgMarkWorker.func2()
	go/src/runtime/mgc.go:1796 +0x16c
runtime.systemstack(0x442001e600)
	go/src/runtime/asm_arm64.s:241 +0x8c
runtime.mstart()
	go/src/runtime/proc.go:1125

I also observed this issue before upgrading to golang 1.9. Here's a crash from golang 1.8:

unexpected fault address 0x400004bea8
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x400004bea8 pc=0x200cfc127c]

goroutine 5 [running]:
runtime.throw(0x200d4f970a, 0x5)
	go/src/runtime/panic.go:596 +0x70 fp=0x44207ad530 sp=0x44207ad510
runtime.sigpanic()
	go/src/runtime/signal_unix.go:297 +0x224 fp=0x44207ad580 sp=0x44207ad530
runtime.bulkBarrierPreWrite(0x4432faaea8, 0x4432f44a28, 0x10)
	go/src/runtime/mbitmap.go:581 +0x324 fp=0x44207ad610 sp=0x44207ad590
runtime.typedmemmove(0x200d6650c0, 0x4432faaea8, 0x4432f44a28)
	go/src/runtime/mbarrier.go:237 +0xa4 fp=0x44207ad640 sp=0x44207ad610
runtime.evacuate(0x200d6aeac0, 0x4426172570, 0x2)
	go/src/runtime/hashmap.go:1072 +0x5b4 fp=0x44207ad730 sp=0x44207ad640
runtime.growWork(0x200d6aeac0, 0x4426172570, 0x1)
	go/src/runtime/hashmap.go:957 +0x8c fp=0x44207ad750 sp=0x44207ad730
runtime.mapassign(0x200d6aeac0, 0x4426172570, 0x44207ad8a0, 0x200dc5b5d8)
	go/src/runtime/hashmap.go:513 +0x524 fp=0x44207ad7f0 sp=0x44207ad750

Slightly longer versions of these crashes are available in blevesearch/bleve#634, but I'm also happy to provide the full list of goroutine backtraces if that is helpful.

Given that the same codebase works fine on linux/freebsd/windows across amd64/i686/arm, I suspect this issue is specific to the arm64 golang runtime.

cc @aclements

@tmm1
Copy link
Contributor Author

tmm1 commented Oct 11, 2017

Looks like all the crashes happen when indexing into mheap_.spans. Both inheap and heapBitsForObject perform bounds checking before reaching into spans, so I'm not sure why it's still failing.

Is this some sort of memory corruption?

go/src/runtime/mheap.go

Lines 372 to 377 in 93322a5

func inheap(b uintptr) bool {
if b == 0 || b < mheap_.arena_start || b >= mheap_.arena_used {
return false
}
// Not a beginning of a block, consult span table to find the block beginning.
s := mheap_.spans[(b-mheap_.arena_start)>>_PageShift]

go/src/runtime/mbitmap.go

Lines 383 to 392 in 93322a5

func heapBitsForObject(p, refBase, refOff uintptr) (base uintptr, hbits heapBits, s *mspan, objIndex uintptr) {
arenaStart := mheap_.arena_start
if p < arenaStart || p >= mheap_.arena_used {
return
}
off := p - arenaStart
idx := off >> _PageShift
// p points into the heap, but possibly to the middle of an object.
// Consult the span table to find the block beginning.
s = mheap_.spans[idx]

@tmm1 tmm1 changed the title runtime sigpanic on android/arm64 runtime: sigpanic during GC on android/arm64 Oct 11, 2017
@tmm1
Copy link
Contributor Author

tmm1 commented Oct 12, 2017

I found another crash report from golang1.8 which is slightly more interesting, but ultimately also happening when accessing heap spans.

runtime.(*mheap).allocSpanLocked
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x400005c0d8 pc=0x2026fc398c]

runtime stack:
runtime.throw(0x202750aaca, 0x2a)
        go/src/runtime/panic.go:596 +0x70
runtime.sigpanic()
        go/src/runtime/signal_unix.go:274 +0x26c
runtime.(*mheap).allocSpanLocked(0x2027c3c4a0, 0x1, 0x2027fa8860)
        go/src/runtime/mheap.go:726 +0xd4
runtime.(*mheap).alloc_m(0x2027c3c4a0, 0x1, 0x1e, 0x0)
        go/src/runtime/mheap.go:562 +0xc4
runtime.(*mheap).alloc.func1()
        go/src/runtime/mheap.go:627 +0x40
runtime.systemstack(0x7ff89d8b88)
        go/src/runtime/asm_arm64.s:255 +0xb8
runtime.(*mheap).alloc(0x2027c3c4a0, 0x1, 0x1000000001e, 0x2026fb705c)
        go/src/runtime/mheap.go:628 +0x60
runtime.(*mcentral).grow(0x2027c3e130, 0x0)
        go/src/runtime/mcentral.go:212 +0x78
runtime.(*mcentral).cacheSpan(0x2027c3e130, 0x4)
        go/src/runtime/mcentral.go:93 +0x350
runtime.(*mcache).refill(0x2027fa2000, 0x200000001e, 0x2027747078)
        go/src/runtime/mcache.go:122 +0x88
runtime.(*mcache).nextFree.func1()
        go/src/runtime/malloc.go:538 +0x28
runtime.systemstack(0x2027c35000)
        go/src/runtime/asm_arm64.s:241 +0x90
runtime.mstart()
        go/src/runtime/proc.go:1132

goroutine 5 [running]:
runtime.systemstack_switch()
        go/src/runtime/asm_arm64.s:190 +0x8 fp=0x4437c85460 sp=0x4437c85450
runtime.(*mcache).nextFree(0x2027fa2000, 0x2026fb361e, 0x2027c355e0, 0x2026ff0cc8, 0x2027c355e0)
        go/src/runtime/malloc.go:539 +0xa0 fp=0x4437c854b0 sp=0x4437c85460
runtime.mallocgc(0x380, 0x20276daec0, 0x20772c7301, 0x2026fc02c8)
        go/src/runtime/malloc.go:691 +0x738 fp=0x4437c85550 sp=0x4437c854b0
runtime.newarray(0x20276daec0, 0x4, 0x2027c35502)
        go/src/runtime/malloc.go:833 +0x60 fp=0x4437c85590 sp=0x4437c85550
runtime.makemap(0x2027689ca0, 0x10, 0x0, 0x0, 0x2029b474ab)
        go/src/runtime/hashmap.go:281 +0x27c fp=0x4437c855e0 sp=0x4437c85590

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Oct 12, 2017
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 12, 2017
@ianlancetaylor
Copy link
Contributor

CC @aclements @RLH

@tmm1
Copy link
Contributor Author

tmm1 commented Oct 14, 2017

mapSpans is the only place in the code-base where mheap.spans is allocated:

go/src/runtime/mheap.go

Lines 542 to 555 in 93322a5

func (h *mheap) mapSpans(arena_used uintptr) {
// Map spans array, PageSize at a time.
n := arena_used
n -= h.arena_start
n = n / _PageSize * sys.PtrSize
n = round(n, physPageSize)
need := n / unsafe.Sizeof(h.spans[0])
have := uintptr(len(h.spans))
if have >= need {
return
}
h.spans = h.spans[:need]
sysMap(unsafe.Pointer(&h.spans[have]), (need-have)*unsafe.Sizeof(h.spans[0]), h.arena_reserved, &memstats.other_sys)
}

I wonder if that means the mmap is failing for some reason on android?

@aclements
Copy link
Member

I keep coming back to this issue and not making any progress. I just don't see how this can happen.

@tmm1, do you happen to get (or can you get) core dumps from any of these? I'd love to see the memory map at the time of the crash.

My initial hunch was that we needed a memory fence between mapping spans and updating h.arena_used in mheap.setArenaUsed. This would be a little crazy since I don't see how the mmap could proceed without a full memory fence of its own, but maybe. However, the panic in #22204 (comment) happens while the heap lock is held, which means it can't be racing with mheap.setArenaUsed.

I tried looking for patterns in the faulting addresses, but didn't turn up much. The heap layout should be:

[0x4000000000, 0x4020000000) spans
[0x4020000000, 0x4420000000) bitmap
[0x4420000000, 0xc420000000) heap

Which means the first three faults correspond to

0x4000017698 => heap byte  0x5da6000 (0x4425da6000)
0x4000006578 => heap byte  0x195e000 (0x442195e000)
0x400004bea8 => heap byte 0x12faa000 (0x4432faa000)

None of these are on any obvious boundaries.

The good news is Go 1.11 almost certainly can't have this problem because I've completely rewritten all of the code involved (https://golang.org/cl/85887).

@aclements aclements modified the milestones: Go1.10, Go1.11 Jan 23, 2018
@gopherbot
Copy link

Change https://golang.org/cl/85887 mentions this issue: runtime: use sparse mappings for the heap

@aclements aclements added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jan 31, 2018
@golang golang locked and limited conversation to collaborators Feb 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

4 participants