Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: scavengelist disabled on 64k page size systems (ppc64) #9993

Closed
davecheney opened this issue Feb 25, 2015 · 27 comments
Closed

runtime: scavengelist disabled on 64k page size systems (ppc64) #9993

davecheney opened this issue Feb 25, 2015 · 27 comments

Comments

@davecheney
Copy link
Contributor

I am sorry for the imprecise nature of this issue. The symptoms are random SIGSEGV's after processes have been running for several minutes. The most visible effect of this is go test std will segfault half way through the test run, making it impossible to complete a CI build.

Last year @mwhudson and I saw a similar issue with gccgo-4.9 compiled programs which was eventually traced to the page size of the machine, most ppc64 machines use 64k pages, being out of sync with the page size of that the runtime uses.

In tracing some processes today I have found occurences of munmap(2) and madvise(2) calls that are not using multiples of 64k

munmap(0x3fff82b40000, 26532)           = 0

More worrying, I've found examples that don't use a multiple of 4096 bytes!

munmap(0x3fff80e10000, 102680)          = 0

/cc @randall77

@randall77
Copy link
Contributor

Linux, right?

The runtime currently uses 8K pages, so everything should be a multiple of
8K even if the OS pages are 4K.

I don't see how an madvise call can possibly get a non-multiple of 8K. It
is only called from sysUnused, and that is only called with a second
argument that is ... << _PageShift.

There are a few places where munmap uses non-8K multiples. It's called
from sysFree, and sysFree (and sysAlloc, for that matter) is called from
some places with sizes that aren't necessarily page sizes. Maybe
sysAlloc/sysFree should round to the page size before calling mmap/munmap?
It may not matter, but I'm not sure what mmap/munmap do with partial pages.

You might want to try setting _PageShift to 16 and see if that helps
ppc64le. We probably don't want to leave that there, but it would be a
good experiment.

On Tue, Feb 24, 2015 at 9:39 PM, Dave Cheney notifications@github.com
wrote:

I am sorry for the imprecise nature of this issue. The symptoms are random
SIGSEGV's after processes have been running for several minutes. The most
visible effect of this is go test std will segfault half way through the
test run, making it impossible to complete a CI build.

Last year @mwhudson https://github.com/mwhudson and I saw a similar
issue with gccgo-4.9 compiled programs which was eventually traced to the
page size of the machine, most ppc64 machines 64k pages, being out of sync
with the page size of that the runtime uses.

In tracing some processes today I have found occurences of munmap(2) and
madvise(2) calls that are not using multiples of 64k

munmap(0x3fff82b40000, 26532) = 0

More worrying, I've found examples that don't use a multiple of 4096 bytes!

munmap(0x3fff80e10000, 102680) = 0

/cc @randall77 https://github.com/randall77


Reply to this email directly or view it on GitHub
#9993.

@davecheney
Copy link
Contributor Author

Yes, linux/ppc64le.

I'll try the page shift hack and also out some asserts in to check
everything is aligned to the page shift.
On 25 Feb 2015 17:22, "Keith Randall" notifications@github.com wrote:

Linux, right?

The runtime currently uses 8K pages, so everything should be a multiple of
8K even if the OS pages are 4K.

I don't see how an madvise call can possibly get a non-multiple of 8K. It
is only called from sysUnused, and that is only called with a second
argument that is ... << _PageShift.

There are a few places where munmap uses non-8K multiples. It's called
from sysFree, and sysFree (and sysAlloc, for that matter) is called from
some places with sizes that aren't necessarily page sizes. Maybe
sysAlloc/sysFree should round to the page size before calling mmap/munmap?
It may not matter, but I'm not sure what mmap/munmap do with partial pages.

You might want to try setting _PageShift to 16 and see if that helps
ppc64le. We probably don't want to leave that there, but it would be a
good experiment.

On Tue, Feb 24, 2015 at 9:39 PM, Dave Cheney notifications@github.com
wrote:

I am sorry for the imprecise nature of this issue. The symptoms are
random
SIGSEGV's after processes have been running for several minutes. The most
visible effect of this is go test std will segfault half way through the
test run, making it impossible to complete a CI build.

Last year @mwhudson https://github.com/mwhudson and I saw a similar
issue with gccgo-4.9 compiled programs which was eventually traced to the
page size of the machine, most ppc64 machines 64k pages, being out of
sync
with the page size of that the runtime uses.

In tracing some processes today I have found occurences of munmap(2) and
madvise(2) calls that are not using multiples of 64k

munmap(0x3fff82b40000, 26532) = 0

More worrying, I've found examples that don't use a multiple of 4096
bytes!

munmap(0x3fff80e10000, 102680) = 0

/cc @randall77 https://github.com/randall77


Reply to this email directly or view it on GitHub
#9993.


Reply to this email directly or view it on GitHub
#9993 (comment).

@davecheney
Copy link
Contributor Author

The plot thickens. With some asserts in place (and _PageShift still 13), someone passed a request for 1439992 bytes to sysAlloc

##### Building packages and commands for linux/ppc64le.
unaligned 1439992
panic: unaligned memory access
fatal error: panic on system stack

runtime stack:
runtime.throw(0x39cfd0, 0x15)
        /home/ubuntu/go/src/runtime/panic.go:511 +0xc0 fp=0x3fffce49e848 sp=0x3fffce49e830
runtime.gopanic(0x2ad460, 0xc208002000)
        /home/ubuntu/go/src/runtime/panic.go:338 +0xec fp=0x3fffce49e8c8 sp=0x3fffce49e848
runtime.checklen(0x15f8f8)
        /home/ubuntu/go/src/runtime/mem_linux.go:24 +0xe0 fp=0x3fffce49e900 sp=0x3fffce49e8c8
runtime.sysAlloc(0x15f8f8, 0x512550, 0x3fffce49e9e8)
        /home/ubuntu/go/src/runtime/mem_linux.go:62 +0x14 fp=0x3fffce49e938 sp=0x3fffce49e900
runtime.stkbucket(0x1, 0x120, 0x3fffce49ea28, 0x5, 0x20, 0xf0d01, 0x3fffce49ea60)
        /home/ubuntu/go/src/runtime/mprof.go:144 +0x60 fp=0x3fffce49e9e0 sp=0x3fffce49e938
runtime.mProf_Malloc(0xc208000000, 0x120)
        /home/ubuntu/go/src/runtime/mprof.go:237 +0xec fp=0x3fffce49eb60 sp=0x3fffce49e9e0
runtime.profilealloc(0x4e67e0, 0xc208000000, 0x120)
        /home/ubuntu/go/src/runtime/malloc.go:749 +0x120 fp=0x3fffce49eb88 sp=0x3fffce49eb60
runtime.mallocgc(0x120, 0x351fa0, 0x0, 0x2adc60)
        /home/ubuntu/go/src/runtime/malloc.go:654 +0x548 fp=0x3fffce49ec58 sp=0x3fffce49eb88
runtime.newobject(0x351fa0, 0x4e5ea0)
        /home/ubuntu/go/src/runtime/malloc.go:698 +0x80 fp=0x3fffce49ec80 sp=0x3fffce49ec58
runtime.malg(0x8000, 0x4e61a0)
        /home/ubuntu/go/src/runtime/proc1.go:1992 +0x30 fp=0x3fffce49ecb8 sp=0x3fffce49ec80
runtime.mpreinit(0x4e67e0)
        /home/ubuntu/go/src/runtime/os1_linux.go:169 +0x34 fp=0x3fffce49ecd0 sp=0x3fffce49ecb8
runtime.mcommoninit(0x4e67e0)
        /home/ubuntu/go/src/runtime/proc1.go:115 +0x118 fp=0x3fffce49ecf8 sp=0x3fffce49ecd0
runtime.schedinit()
        /home/ubuntu/go/src/runtime/proc1.go:57 +0x98 fp=0x3fffce49ed40 sp=0x3fffce49ecf8
runtime.rt0_go(0x3fffce49f702, 0x3fffce49f70b, 0x3fffce49f70c, 0x3fffce49f70f, 0x3fffce49f713, 0x0, 0x3fffce49f717, 0x3fffce49f71e, 0x3fffce49f730, 0x3fffce49f740, ...)
        /home/ubuntu/go/src/runtime/asm_ppc64x.s:69 +0xa4 fp=0x3fffce49ed80 sp=0x3fffce49ed40

@davecheney
Copy link
Contributor Author

Instrumenting sysAlloc may have been a bit aggressive, but this assertion in sysFree is very worrying -- it's not even aligned to an 8k page, and the remaining 7.3k will be discarded by the operating system and probably zeroed when we touch it next.

##### Building packages and commands.                                                                                                       [74/9576]
runtime
# runtime
unaligned 58152
panic: unaligned memory access
fatal error: panic on system stack

runtime stack:
runtime.throw(0x391570, 0x15)
        /home/ubuntu/go/src/runtime/panic.go:511 +0xc0 fp=0x3fffd0feb108 sp=0x3fffd0feb0f0
runtime.gopanic(0x2f7b00, 0xc20b6e1f60)
        /home/ubuntu/go/src/runtime/panic.go:338 +0xec fp=0x3fffd0feb188 sp=0x3fffd0feb108
runtime.checklen(0xe328)
        /home/ubuntu/go/src/runtime/mem_linux.go:24 +0xe0 fp=0x3fffd0feb1c0 sp=0x3fffd0feb188
runtime.sysFree(0x3fff7b930000, 0xe328, 0x513700)
        /home/ubuntu/go/src/runtime/mem_linux.go:88 +0x2c fp=0x3fffd0feb1e0 sp=0x3fffd0feb1c0
runtime.gcCopySpans()
        /home/ubuntu/go/src/runtime/mgc.go:607 +0xd8 fp=0x3fffd0feb208 sp=0x3fffd0feb1e0
runtime.gcMark(0xb831317ec7648)
        /home/ubuntu/go/src/runtime/mgc.go:452 +0x8c fp=0x3fffd0feb348 sp=0x3fffd0feb208
runtime.gc.func2()
        /home/ubuntu/go/src/runtime/mgc.go:371 +0x40 fp=0x3fffd0feb380 sp=0x3fffd0feb348
runtime.systemstack(0x4d9030)
        /home/ubuntu/go/src/runtime/asm_ppc64x.s:236 +0x90 fp=0x3fffd0feb388 sp=0x3fffd0feb380
runtime.mstart()
        /home/ubuntu/go/src/runtime/proc1.go:682 fp=0x3fffd0feb388 sp=0x3fffd0feb388

@dvyukov
Copy link
Member

dvyukov commented Feb 25, 2015

but this assertion in sysFree is very worrying -- it's not even aligned to an 8k page

what's wrong with it?
it is ok to mmap/munmap less than a page

You might want to try setting _PageShift to 16

_PageSize is only for heap and is an internal detail
don't change it for the sake of OS
if we need to tune something then it is _PhysPageSize

@davecheney
Copy link
Contributor Author

My understanding is mmap, munmap etc always work on page sized quantities.

The result is if you madvise DONT_NEED a 4k section on a 64k system, the
kernel will round this up to 64k and drop 60 odd kB of data that the gc
thinks is active.

We hit exactly this problem using gccgo on 64kb systems last year.

Sorry for the brevity (phone)
On 26 Feb 2015 00:25, "Dmitry Vyukov" notifications@github.com wrote:

but this assertion in sysFree is very worrying -- it's not even aligned to
an 8k page

what's wrong with it?
it is ok to mmap/munmap less than a page

You might want to try setting _PageShift to 16

_PageSize is only for heap and is an internal detail
don't change it for the sake of OS
if we need to tune something then it is _PhysPageSize


Reply to this email directly or view it on GitHub
#9993 (comment).

@dvyukov
Copy link
Member

dvyukov commented Feb 26, 2015

Is the problem only with madvise or with mmap/munmap as well?

@davecheney
Copy link
Contributor Author

We saw the issue with the scavenger madvising unused 4k pages of the heap,
and the kernel rounding that up to 64k pages.

On Thu, Feb 26, 2015 at 3:21 PM, Dmitry Vyukov notifications@github.com
wrote:

Is the problem only with madvise or with mmap/munmap as well?


Reply to this email directly or view it on GitHub
#9993 (comment).

@dvyukov
Copy link
Member

dvyukov commented Feb 26, 2015

I see.
As always man page is super helpful, it talks about everything except the important parts:

The Linux implementation requires that the address addr be page-aligned,
 and allows length to be zero.  If there are some parts of  the  specified
  address  range that are not mapped, the Linux version of madvise() ignores
 them and applies the call to the rest (but returns ENOMEM from the system
 call, as it should).

If we saw that it happens in reality, then I guess we need to fix it.
However, mmap/munmap with unaligned size are OK, don't touch them. Also don't change _PageSize and don't use it outside of heap code. _PhysPageSize is what you need.

@davecheney
Copy link
Contributor Author

I believe the problem is here

https://github.com/golang/go/blob/master/src/runtime/mheap.go#L731

Attached is the patch we sent to gccgo last year, but reading over it I am unsure if it is safe to use here. I'm not sure it is safe to widen the range of memory freed as there is no signal to the GC that we've freed a larger area than the scavenge free list wanted us to do.

diff -r 3a53301d24d7 libgo/runtime/mheap.c
--- a/libgo/runtime/mheap.c Tue Apr 22 16:43:35 2014 -0700
+++ b/libgo/runtime/mheap.c Thu Apr 24 21:18:35 2014 -0700
@@ -387,7 +387,7 @@
 static uintptr
 scavengelist(MSpan *list, uint64 now, uint64 limit)
 {
-   uintptr released, sumreleased;
+   uintptr released, sumreleased, start, end, pagesize;
    MSpan *s;

    if(runtime_MSpanList_IsEmpty(list))
@@ -400,7 +400,17 @@
            mstats.heap_released += released;
            sumreleased += released;
            s->npreleased = s->npages;
-           runtime_SysUnused((void*)(s->start << PageShift), s->npages << PageShift);
+
+           start = s->start << PageShift;
+           end = start + (s->npages << PageShift);
+
+           // Round start up and end down to ensure we
+           // are acting on entire pages.
+           pagesize = getpagesize();
+           start = ROUND(start, pagesize);
+           end &= ~(pagesize - 1);
+           if(end > start)
+               runtime_SysUnused((void*)start, end - start);
        }
    }
    return sumreleased;

@dvyukov
Copy link
Member

dvyukov commented Feb 26, 2015

not sure it is safe to widen the range of memory freed

as far as I see, you only shrink the region from both ends

we will need to ensure that _PageSize <= _PhysPageSize on platforms where SysUnused is not a no-op

also I suspect that such change can make scavenger less efficient on platforms with _PhysPageSize > _PageSize, because even consecutive unused pages can end up being not madvised (though, of course, it is better than crashing)

@davecheney davecheney changed the title runtime: ppc64le build very unstable runtime: scavengelist can corrupt heap memory on 64k page size systems Feb 26, 2015
davecheney added a commit that referenced this issue Feb 26, 2015
Update #9993

If the physical page size of the machine is larger than the logical
heap size, for example 8k logical, 64k physical, then madvise(2) will
round up the requested amount to a 64k boundary and may discard pages
close to the page being madvised.

This patch disables the scavenger in these situations, which at the moment
is only ppc64 and ppc64le systems. NaCl also uses a 64k page size, but
it's not clear if it is affected by this problem.

Change-Id: Ib897f8d3df5bd915ddc0b510f2fd90a30ef329ca
Reviewed-on: https://go-review.googlesource.com/6091
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
@rsc rsc added this to the Go1.5Maybe milestone Apr 10, 2015
@rsc rsc modified the milestones: Go1.6Early, Go1.5Maybe Jul 21, 2015
@rsc rsc changed the title runtime: scavengelist can corrupt heap memory on 64k page size systems runtime: scavengelist can corrupt heap memory on 64k page size systems (ppc64) Jul 21, 2015
@rsc rsc changed the title runtime: scavengelist can corrupt heap memory on 64k page size systems (ppc64) runtime: scavengelist disabled on 64k page size systems (ppc64) Nov 24, 2015
@rsc rsc modified the milestones: Unplanned, Go1.6Early Nov 24, 2015
@efirestone
Copy link

Hey all, we're interested in creating a Go Mobile library, but this issue is currently a blocker for us. On arm64 devices this issue means that the Go GC will increasingly own memory within an iOS (and presumably Android) process and never give it back, which can eventually get to the point where the memory available to the rest of the process is too small to run normal operations and OS will kill the app due to memory pressure.

More discussion here: https://groups.google.com/forum/#!topic/golang-nuts/-7ehtvFzJj0

Thanks!

@aclements
Copy link
Member

As a first step, we can certainly make scavengelist or sysUnused round the boundaries in to _PhysPageSize. That will at least help with scavenging memory that backed large allocations and, as free spans get coalesced, it will eventually release coalesced spans that are sufficiently large (at least most of the time).

I have some ideas for how to make the scavenger work well in environments where the heap's granularity is finer than the system's granularity. Our current heuristic of releasing a free span only once no page in that span has been used for 5 minutes, in addition to being ridiculously conservative, is a good way to turn the heap in to Swiss cheese. To a first order, the age of a particular span is irrelevant (there is some weak temporal locality). What matters is not which spans we release, but how many we release, so we could chose how many using a temporal heuristic, but chose which spans to release using a spacial heuristic. If we bias this toward large ranges, we'll free memory much more effectively on platforms with large physical pages.

I'm hoping to get to this early in 1.7.

@gopherbot
Copy link

CL https://golang.org/cl/22064 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/22065 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/22061 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/22066 mentions this issue.

@aclements
Copy link
Member

@davecheney, would you mind stress testing CL 22066 on a PPC or ARM64? Based on your original post, it sounds like go test std is sufficient to trigger the problem. I suspect setting GODEBUG=scavenge=1 should stress it much harder.

@davecheney
Copy link
Contributor Author

Thanks Austin. I've asked @mwhudson to take a look as I'm travelling ATM.

On Fri, 15 Apr 2016, 06:17 Austin Clements, notifications@github.com
wrote:

@davecheney https://github.com/davecheney, would you mind stress
testing CL 22066 on a PPC or ARM64? Based on your original post, it sounds
like go test std is sufficient to trigger the problem. I suspect setting
GODEBUG=scavenge=1 should stress it much harder.


You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub
#9993 (comment)

@mwhudson
Copy link
Contributor

I'll test this. To be clear, it should be a system where "getconf PAGESIZE" says 65536? (The arm64 machine I have access to appears to have 4k pages, but the ppc64 one is 64k)

@aclements
Copy link
Member

Thanks! Yes, in that case you should stress test on the ppc64. Though it would be good to make sure it passes all.bash on the arm64 just to test the case where physical pages are smaller than sys.PhysPageSize.

@mwhudson
Copy link
Contributor

GODEBUG=scavenge=1 is quite slow :-)

@mwhudson
Copy link
Contributor

The tests didn't pass cleanly with scavenge=1 but I sort of get the feeling that's not the fault of that CL: https://gist.github.com/mwhudson/21badf1d67536a9fc3719d19cc5865b6

I'm running the same test on master now.

@aclements
Copy link
Member

GODEBUG=scavenge=1 is quite slow :-)

Yeah. :) Unfortunately, that also turns the forced GC rate way up, which is probably where a lot of the slowdown is coming from.

I agree that those failures don't look likely to be related to this CL.

As a baseline, I tried GODEBUG=scavenge=1 go test std and just go test std on linux/amd64 on master (12e3b18, the branch point of CL 22066) and in both cases got one failure:

--- FAIL: TestWaitGroupMisuse2 (2.09s)
        waitgroup_test.go:114: Should panic
        waitgroup_test.go:84: Unexpected panic: <nil>
FAIL
FAIL    sync    2.398s

Also, with scavenge=1, math/big took 297 seconds, so I'm not too surprised it timed out on arm64 and ppc64. The other failures in the gist you posted all looked networking-related.

@mwhudson
Copy link
Contributor

The test run on master had the same failures so yeah. Test passed afaict.

@aclements
Copy link
Member

Thanks! I'll push these in a little bit.

mk0x9 pushed a commit to mk0x9/go that referenced this issue Apr 16, 2016
Currently several different Linux architectures have separate copies
of the auxv parser. Bring these all together into a single copy of the
parser that calls out to a per-arch handler for each tag/value pair.
This is in preparation for handling common auxv tags in one place.

For golang#9993.

Change-Id: Iceebc3afad6b4133b70fca7003561ae370445c10
Reviewed-on: https://go-review.googlesource.com/22061
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
mk0x9 pushed a commit to mk0x9/go that referenced this issue Apr 16, 2016
The runtime hard-codes an assumed physical page size. If this is
smaller than the kernel's page size or not a multiple of it, sysUnused
may incorrectly release more memory to the system than intended.

Add a runtime startup check that the runtime's assumed physical page
is compatible with the kernel's physical page size.

For golang#9993.

Change-Id: Ida9d07f93c00ca9a95dd55fc59bf0d8a607f6728
Reviewed-on: https://go-review.googlesource.com/22064
Reviewed-by: Rick Hudson <rlh@golang.org>
mk0x9 pushed a commit to mk0x9/go that referenced this issue Apr 16, 2016
If sysUnused is passed an address or length that is not aligned to the
physical page boundary, the kernel will unmap more memory than the
caller wanted. Add a check for this.

For golang#9993.

Change-Id: I68ff03032e7b65cf0a853fe706ce21dc7f2aaaf8
Reviewed-on: https://go-review.googlesource.com/22065
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
@gopherbot
Copy link

CL https://golang.org/cl/27403 mentions this issue.

gopherbot pushed a commit that referenced this issue Sep 6, 2016
TestFreeOSMemory was disabled on many arches because of issue #9993.
Since that's been fixed, enable the test everywhere.

Change-Id: I298c38c3e04128d9c8a1f558980939d5699bea03
Reviewed-on: https://go-review.googlesource.com/27403
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
@golang golang locked and limited conversation to collaborators Aug 19, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants