Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: heapBitsForSpan: base out of range #20259

Closed
karalabe opened this issue May 5, 2017 · 10 comments
Closed

runtime: heapBitsForSpan: base out of range #20259

karalabe opened this issue May 5, 2017 · 10 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@karalabe
Copy link
Contributor

karalabe commented May 5, 2017

We've received another crash report on android/arm (ARMv7) with Go 1.8.1 around memory allocations: ethereum/go-ethereum#14422.

The report claims that it was a one off event and couldn't reproduce it, so we can't really tell much more about it than the crash log. The crash part of the dump is below, but you can look ad the linked issue for the full version.

E/Go: fatal error: heapBitsForSpan: base out of range
E/Go: runtime stack:
E/Go: runtime.throw(0x9f0a62d5, 0x22)
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/panic.go:596 +0x78
E/Go: runtime.heapBitsForSpan(0x93fd0000, 0x1, 0x16)
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/mbitmap.go:377 +0x84
E/Go: runtime.(*mcentral).grow(0x9f92fc08, 0x0)
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/mcentral.go:220 +0xc0
E/Go: runtime.(*mcentral).cacheSpan(0x9f92fc08, 0xf0)
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/mcentral.go:93 +0x10c
E/Go: runtime.(*mcache).refill(0x7e4484b0, 0x16, 0x744)
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/mcache.go:122 +0x84
E/Go: runtime.(*mcache).nextFree.func1()
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/malloc.go:526 +0x24
E/Go: runtime.systemstack(0x8e698a80)
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/asm_arm.s:264 +0x8c
E/Go: runtime.mstart()
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/proc.go:1132
E/Go: goroutine 26327 [running]:
E/Go: runtime.systemstack_switch()
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/asm_arm.s:209 +0x4 fp=0x98a02bb8 sp=0x98a02bb4
E/Go: runtime.(*mcache).nextFree(0x7e4484b0, 0x16, 0x0, 0x9ed7102c, 0x0)
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/malloc.go:527 +0xa0 fp=0x98a02be4 sp=0x98a02bb8
E/Go: runtime.mallocgc(0x1a0, 0x9f4abc98, 0x9e90ed01, 0x93449a00)
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/malloc.go:679 +0x918 fp=0x98a02c3c sp=0x98a02be4
E/Go: runtime.newobject(0x9f4abc98, 0x93449a00)
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/malloc.go:808 +0x2c fp=0x98a02c50 sp=0x98a02c3c
E/Go: github.com/ethereum/go-ethereum/consensus/clique.sigHash(0x8fad6200, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
E/Go: 	/home/travis/go/src/github.com/ethereum/go-ethereum/consensus/clique/clique.go:141 +0x50 fp=0x98a02c9c sp=0x98a02c50
E/Go: github.com/ethereum/go-ethereum/consensus/clique.ecrecover(0x8fad6200, 0x0, 0x0, 0x0, 0x0, 0x0, 0xd59502e1, 0x67d94ad7)
E/Go: 	/home/travis/go/src/github.com/ethereum/go-ethereum/consensus/clique/clique.go:173 +0x9c fp=0x98a02d00 sp=0x98a02c9c
E/Go: github.com/ethereum/go-ethereum/consensus/clique.(*Clique).verifySeal(0x8ee910e0, 0x9f873ac8, 0x8ee91180, 0x8fad6200, 0x8f7ec000, 0x21e, 0x1580, 0xf4ccd9d, 0x6ddf6dff)
E/Go: 	/home/travis/go/src/github.com/ethereum/go-ethereum/consensus/clique/clique.go:467 +0x14c fp=0x98a02e14 sp=0x98a02d00
E/Go: github.com/ethereum/go-ethereum/consensus/clique.(*Clique).verifyCascadingFields(0x8ee910e0, 0x9f873ac8, 0x8ee91180, 0x8fad6200, 0x8f7ec000, 0x21e, 0x1580, 0x2, 0x0)
E/Go: 	/home/travis/go/src/github.com/ethereum/go-ethereum/consensus/clique/clique.go:354 +0x49c fp=0x98a02edc sp=0x98a02e14
E/Go: github.com/ethereum/go-ethereum/consensus/clique.(*Clique).verifyHeader(0x8ee910e0, 0x9f873ac8, 0x8ee91180, 0x8fad6200, 0x8f7ec000, 0x21e, 0x1580, 0x0, 0x0)
E/Go: 	/home/travis/go/src/github.com/ethereum/go-ethereum/consensus/clique/clique.go:311 +0x2fc fp=0x98a02f48 sp=0x98a02edc
E/Go: github.com/ethereum/go-ethereum/consensus/clique.(*Clique).VerifyHeaders.func1(0x8f7ec000, 0x800, 0x1580, 0x8ee910e0, 0x9f873ac8, 0x8ee91180, 0x94022d00, 0x94022d40)
E/Go: 	/home/travis/go/src/github.com/ethereum/go-ethereum/consensus/clique/clique.go:243 +0x70 fp=0x98a02fcc sp=0x98a02f48
E/Go: runtime.goexit()
E/Go: 	/home/travis/build/ethereum/go-ethereum/go/src/runtime/asm_arm.s:1017 +0x4 fp=0x98a02fcc sp=0x98a02fcc
E/Go: created by github.com/ethereum/go-ethereum/consensus/clique.(*Clique).VerifyHeaders
E/Go: 	/home/travis/go/src/github.com/ethereum/go-ethereum/consensus/clique/clique.go:251 +0xb4
@bradfitz bradfitz added this to the Go1.8.2 milestone May 5, 2017
@gopherbot
Copy link

CL https://golang.org/cl/43310 mentions this issue.

gopherbot pushed a commit that referenced this issue May 12, 2017
This adds debugging information when we panic with "heapBitsForSpan:
base out of range".

Updates #20259.

Change-Id: I0dc1a106aa9e9531051c7d08867ace5ef230eb3f
Reviewed-on: https://go-review.googlesource.com/43310
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@gopherbot
Copy link

CL https://golang.org/cl/43410 mentions this issue.

@aclements aclements added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 16, 2017
@bradfitz bradfitz modified the milestones: Go1.8.2, Go1.8.3 May 18, 2017
@aclements
Copy link
Member

Some debugging notes:

runtime.mallocgc(0x1a0, 0x9f4abc98, 0x9e90ed01, 0x93449a00) - Size is 0x1a0 (416) bytes.
runtime.(*mcache).nextFree(0x7e4484b0, 0x16, 0x0, 0x9ed7102c, 0x0) - Size class is 0x16, which is exactly 416 bytes.
runtime.(*mcache).refill(0x7e4484b0, 0x16, 0x744) - The class 0x16 mcache was out of free objects.
runtime.(*mcentral).grow(0x9f92fc08, 0x0) - The class 0x16 mcentral was also out of free objects.

At this point it would have called mheap_.alloc(1, 0x16, false, true) to allocate a new 1 page span. Given that heapBitsForSpan is complaining about the base, this probably got to mheap_.grow(1), since that's the only path that modifies the arena bounds.

We can also glean some things about the address space layout. Since this is 32-bit, arena_start is 0. That means the failure must have happened because base >= mheap_.arena_used, where base = 0x93fd0000. This base is interesting because it's below Go's static data, which includes the string to throw at 0x9f0a62d5 and the mcentral at 0x9f92fc08. I'm not positive, but I think this means we're allocating outside the initial arena reservation. However, this g's stack starts at address 0x98a02fcc, which means we've already heap allocated above the bad base address, which means arena_used > base at some point in the past.

This suggests two possibilities: 1. arena_used somehow got entirely corrupted (which isn't very helpful for debugging) or 2. some intentional update to arena_used decreased it (possibly as a result of overflow), which is never supposed to happen.

@aclements
Copy link
Member

Found it (I love symbolic execution). Actually, I'm surprised this doesn't fail more often.

  1. mallocinit reserves some memory for the arena. Let's say it gets a 128MB mapping at a high address like 0x70000000. It sets arena_start to 0 (because this is 32-bit), arena_used to 0x70000000, and arena_end to 0x78000000.
  2. Now, we do an allocation larger than arena_end - arena_used. Let's say it's a 256MB allocation. mheap.sysAlloc takes the first branch to grow the reservation. There's still plenty of address space above arena_end, so it calls sysReserve with arena_end.
  3. sysReserve is free to put the mapping wherever it wants (the address is just a hint). Let's say it returns a low address for p, like 0x10000000.
  4. mheap.sysAlloc takes the h.arena_start <= p && p+p_size-h.arena_start-1 <= _MaxArena32 branch. But this code assumes that the new mapping will appear somewhere above arena_end, so it blindly sets arena_start to p. In this case, this lowers arena_used to 0x10000000, so now we think everything above that is not in the Go heap.

I think the fix is easy, but we've screwed up this code more times than I care to count. :) I'll put together and verify a patch on Monday, if not earlier.

@bradfitz bradfitz added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 19, 2017
@gopherbot
Copy link

CL https://golang.org/cl/43870 mentions this issue.

@aclements
Copy link
Member

Reopening: CL 43870 is not on release branch.

@aclements aclements reopened this May 23, 2017
@gopherbot
Copy link

CL https://golang.org/cl/43954 mentions this issue.

@karalabe
Copy link
Contributor Author

Any particular reason for postponing this (and everything else) from Go 1.8.2 to 1.8.3? Other than 1.8.2 being an imminent security release?

@aclements
Copy link
Member

It got moved to 1.8.3 precisely because 1.8.2 is a security release (and we don't combine other things into security releases). I believe the plan is to release 1.8.3 basically immediately after 1.8.2, so for people who don't care about the distinction between security and non-security point releases, it shouldn't matter.

gopherbot pushed a commit that referenced this issue May 23, 2017
Cherry-pick of CL 43310.

This adds debugging information when we panic with "heapBitsForSpan:
base out of range".

Updates #20259.

Change-Id: I0dc1a106aa9e9531051c7d08867ace5ef230eb3f
Reviewed-on: https://go-review.googlesource.com/43410
Run-TryBot: Austin Clements <austin@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
gopherbot pushed a commit that referenced this issue May 23, 2017
Cherry-pick of CL 43870.

If mheap.sysAlloc doesn't have room in the heap arena for an
allocation, it will attempt to map more address space with sysReserve.
sysReserve is given a hint, but can return any unused address range.
Currently, mheap.sysAlloc incorrectly assumes the returned region will
never fall between arena_start and arena_used. If it does,
mheap.sysAlloc will blindly accept the new region as the new
arena_used and arena_end, causing these to decrease and make it so any
Go heap above the new arena_used is no longer considered part of the
Go heap. This assumption *used to be* safe because we had all memory
between arena_start and arena_used mapped, but when we switched to an
arena_start of 0 on 32-bit, it became no longer safe.

Most likely, we've only recently seen this bug occur because we
usually start arena_used just above the binary, which is low in the
address space. Hence, the kernel is very unlikely to give us a region
before arena_used.

Since mheap.sysAlloc is a linear allocator, there's not much we can do
to handle this well. Hence, we fix this problem by simply rejecting
the new region if it isn't after arena_end. In this case, we'll take
the fall-back path and mmap a small region at any address just for the
requested memory.

Fixes #20259.

Change-Id: Ib72e8cd621545002d595c7cade1e817cfe3e5b1e
Reviewed-on: https://go-review.googlesource.com/43954
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Chris Broadfoot <cbro@golang.org>
@broady broady closed this as completed May 23, 2017
gopherbot pushed a commit that referenced this issue May 25, 2017
Currently, the heap arena allocator allocates monotonically increasing
addresses. This is fine on 64-bit where we stake out a giant block of
the address space for ourselves and start at the beginning of it, but
on 32-bit the arena starts at address 0 but we start allocating from
wherever the OS feels like giving us memory. We can generally hint the
OS to start us at a low address, but this doesn't always work.

As a result, on 32-bit, if the OS gives us an arena block that's lower
than the current block we're allocating from, we simply say "thanks
but no thanks", return the whole (256MB!) block of memory, and then
take a fallback path that mmaps just the amount of memory we need
(which may be as little as 8K).

We have to do this because mheap_.arena_used is *both* the highest
used address in the arena and the next address we allocate from.

Fix all of this by separating the second role of arena_used out into a
new field called arena_alloc. This lets us accept any arena block the
OS gives us. This also slightly changes the invariants around
arena_end. Previously, we ensured arena_used <= arena_end, but this
was related to arena_used's second role, so the new invariant is
arena_alloc <= arena_end. As a result, we no longer necessarily update
arena_end when we're updating arena_used.

Fixes #20259 properly. (Unlike the original fix, this one should not
be cherry-picked to Go 1.8.)

This is reasonably low risk. I verified several key properties of the
32-bit code path with both 4K and 64K physical pages using a symbolic
model and the change does not materially affect 64-bit (arena_used ==
arena_alloc on 64-bit). The only oddity is that we no longer call
setArenaUsed with racemap == false to indicate that we're creating a
hole in the address space, but this only happened in a 32-bit-only
code path, and the race detector require 64-bit, so this never
mattered anyway.

Change-Id: Ib1334007933e615166bac4159bf357ae06ec6a25
Reviewed-on: https://go-review.googlesource.com/44010
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/85887 mentions this issue: runtime: use sparse mappings for the heap

gopherbot pushed a commit that referenced this issue Feb 15, 2018
This replaces the contiguous heap arena mapping with a potentially
sparse mapping that can support heap mappings anywhere in the address
space.

This has several advantages over the current approach:

* There is no longer any limit on the size of the Go heap. (Currently
  it's limited to 512GB.) Hence, this fixes #10460.

* It eliminates many failures modes of heap initialization and
  growing. In particular it eliminates any possibility of panicking
  with an address space conflict. This can happen for many reasons and
  even causes a low but steady rate of TSAN test failures because of
  conflicts with the TSAN runtime. See #16936 and #11993.

* It eliminates the notion of "non-reserved" heap, which was added
  because creating huge address space reservations (particularly on
  64-bit) led to huge process VSIZE. This was at best confusing and at
  worst conflicted badly with ulimit -v. However, the non-reserved
  heap logic is complicated, can race with other mappings in non-pure
  Go binaries (e.g., #18976), and requires that the entire heap be
  either reserved or non-reserved. We currently maintain the latter
  property, but it's quite difficult to convince yourself of that, and
  hence difficult to keep correct. This logic is still present, but
  will be removed in the next CL.

* It fixes problems on 32-bit where skipping over parts of the address
  space leads to mapping huge (and never-to-be-used) metadata
  structures. See #19831.

This also completely rewrites and significantly simplifies
mheap.sysAlloc, which has been a source of many bugs. E.g., #21044,
 #20259, #18651, and #13143 (and maybe #23222).

This change also makes it possible to allocate individual objects
larger than 512GB. As a result, a few tests that expected huge
allocations to fail needed to be changed to make even larger
allocations. However, at the moment attempting to allocate a humongous
object may cause the program to freeze for several minutes on Linux as
we fall back to probing every page with addrspace_free. That logic
(and this failure mode) will be removed in the next CL.

Fixes #10460.
Fixes #22204 (since it rewrites the code involved).

This slightly slows down compilebench and the x/benchmarks garbage
benchmark.

name       old time/op     new time/op     delta
Template       184ms ± 1%      185ms ± 1%    ~     (p=0.065 n=10+9)
Unicode       86.9ms ± 3%     86.3ms ± 1%    ~     (p=0.631 n=10+10)
GoTypes        599ms ± 0%      602ms ± 0%  +0.56%  (p=0.000 n=10+9)
Compiler       2.87s ± 1%      2.89s ± 1%  +0.51%  (p=0.002 n=9+10)
SSA            7.29s ± 1%      7.25s ± 1%    ~     (p=0.182 n=10+9)
Flate          118ms ± 2%      118ms ± 1%    ~     (p=0.113 n=9+9)
GoParser       147ms ± 1%      148ms ± 1%  +1.07%  (p=0.003 n=9+10)
Reflect        401ms ± 1%      404ms ± 1%  +0.71%  (p=0.003 n=10+9)
Tar            175ms ± 1%      175ms ± 1%    ~     (p=0.604 n=9+10)
XML            209ms ± 1%      210ms ± 1%    ~     (p=0.052 n=10+10)

(https://perf.golang.org/search?q=upload:20171231.4)

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.23ms ± 1%  2.25ms ± 1%  +0.84%  (p=0.000 n=19+19)

(https://perf.golang.org/search?q=upload:20171231.3)

Relative to the start of the sparse heap changes (starting at and
including "runtime: fix various contiguous bitmap assumptions"),
overall slowdown is roughly 1% on GC-intensive benchmarks:

name        old time/op     new time/op     delta
Template        183ms ± 1%      185ms ± 1%  +1.32%  (p=0.000 n=9+9)
Unicode        84.9ms ± 2%     86.3ms ± 1%  +1.65%  (p=0.000 n=9+10)
GoTypes         595ms ± 1%      602ms ± 0%  +1.19%  (p=0.000 n=9+9)
Compiler        2.86s ± 0%      2.89s ± 1%  +0.91%  (p=0.000 n=9+10)
SSA             7.19s ± 0%      7.25s ± 1%  +0.75%  (p=0.000 n=8+9)
Flate           117ms ± 1%      118ms ± 1%  +1.10%  (p=0.000 n=10+9)
GoParser        146ms ± 2%      148ms ± 1%  +1.48%  (p=0.002 n=10+10)
Reflect         398ms ± 1%      404ms ± 1%  +1.51%  (p=0.000 n=10+9)
Tar             173ms ± 1%      175ms ± 1%  +1.17%  (p=0.000 n=10+10)
XML             208ms ± 1%      210ms ± 1%  +0.62%  (p=0.011 n=10+10)
[Geo mean]      369ms           373ms       +1.17%

(https://perf.golang.org/search?q=upload:20180101.2)

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.22ms ± 1%  2.25ms ± 1%  +1.51%  (p=0.000 n=20+19)

(https://perf.golang.org/search?q=upload:20180101.3)

Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0
Reviewed-on: https://go-review.googlesource.com/85887
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
@golang golang locked and limited conversation to collaborators Jan 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants