Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: running Go code on OpenBSD gomote fails when not running as root #35568

Closed
ianlancetaylor opened this issue Nov 13, 2019 · 11 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-OpenBSD release-blocker
Milestone

Comments

@ianlancetaylor
Copy link
Contributor

I don't know what is going on here, but recording since something is wrong.

When I use gomote run with the openbsd-amd64-62 gomote, everything works as expected. When I use gomote ssh to ssh into the gomote, the go tool consistently fails with the following stack trace.

The only obvious difference is that gomote run runs as root but gomote ssh does not.

CC @mknyszek @aclements @bradfitz

fatal error: failed to reserve page bitmap memory

runtime stack:
runtime.throw(0xa4504f, 0x24)
        /tmp/workdir/go/src/runtime/panic.go:1106 +0x72 fp=0x7f7ffffd5018 sp=0x7f7ffffd4fe8 pc=0x4331a2
runtime.(*pageAlloc).init(0xeb7b88, 0xeb7b80, 0xecff58)
        /tmp/workdir/go/src/runtime/mpagealloc.go:239 +0x162 fp=0x7f7ffffd5060 sp=0x7f7ffffd5018 pc=0x428c12
runtime.(*mheap).init(0xeb7b80)
        /tmp/workdir/go/src/runtime/mheap.go:694 +0x274 fp=0x7f7ffffd5088 sp=0x7f7ffffd5060 pc=0x425de4
runtime.mallocinit()
        /tmp/workdir/go/src/runtime/malloc.go:471 +0xff fp=0x7f7ffffd50b8 sp=0x7f7ffffd5088 pc=0x40c5af
runtime.schedinit()
        /tmp/workdir/go/src/runtime/proc.go:545 +0x60 fp=0x7f7ffffd5110 sp=0x7f7ffffd50b8 pc=0x436700
runtime.rt0_go(0x7f7ffffd5148, 0x1, 0x7f7ffffd5148, 0x0, 0x0, 0x1, 0x7f7ffffd5238, 0x0, 0x7f7ffffd523b, 0x7f7ffffd5254, ...)
        /tmp/workdir/go/src/runtime/asm_amd64.s:214 +0x125 fp=0x7f7ffffd5118 sp=0x7f7ffffd5110 pc=0x45eff5
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Nov 13, 2019
@ianlancetaylor ianlancetaylor added this to the Go1.14 milestone Nov 13, 2019
@mknyszek mknyszek self-assigned this Nov 13, 2019
@mknyszek
Copy link
Contributor

If I were to guess, there's an RLIMIT_AS set for non-root users or something. I'll look into this now.

@bradfitz
Copy link
Contributor

bradfitz commented Nov 13, 2019

People who like this bug also like #10719.

/cc @bcmills

@ianlancetaylor
Copy link
Contributor Author

For root bash's ulimit -v reports 33562624. For non-root, it reports 1576960.

@mknyszek
Copy link
Contributor

@ianlancetaylor I investigated this with someone who knows OpenBSD a bit.

The number that's checked in the kernel from a PROT_NONE anonymous mapping is RLIMIT_DATA, which is limited for non-root users in login.conf. We can fix this on our builders by making datasize-cur and datasize-max unlimited, but if OpenBSD has a low default then that's a problem since Go no longer works out of the box on a newly-installed OpenBSD image.

Perhaps there's a workaround here but I need to give it some thought.

It's a little bit weird to me that a PROT_NONE mapping counts toward this on any platform. Linux does this too, but its default for everyone for virtual address space is unlimited and not 768 MiB.

@ianlancetaylor
Copy link
Contributor Author

Just a note that the same problem still happens on OpenBSD 6.4.

@bradfitz
Copy link
Contributor

/cc @mdempsky as FYI just because he likes OpenBSD issues.

@mdempsky
Copy link
Member

It's been a while since I've looked at OpenBSD's mmap implementation. I seem to recall it took the stance that PROT_NONE mappings still mapped data (and thus counts towards RLIMIT_DATA), just without read/write access.

I see that http://cvsweb.openbsd.org/cgi-bin/cvsweb/ports/lang/go/Makefile?rev=1.75 uses ulimit -d $(ulimit -H -d) to raise the RLIMIT_DATA soft limit to the hard limit. But maybe that only works because ports builders usually have login class "staff" or "pbuild" (which set datasize-max=infinity), whereas gomote ssh is just giving a default login class (with datasize-max=768M)?

/cc @4a6f656c

@gopherbot
Copy link

Change https://golang.org/cl/207497 mentions this issue: runtime: convert page allocator bitmap to sparse array

gopherbot pushed a commit that referenced this issue Dec 3, 2019
Currently the page allocator bitmap is implemented as a single giant
memory mapping which is reserved at init time and committed as needed.
This causes problems on systems that don't handle large uncommitted
mappings well, or institute low virtual address space defaults as a
memory limiting mechanism.

This change modifies the implementation of the page allocator bitmap
away from a directly-mapped set of bytes to a sparse array in same vein
as mheap.arenas. This will hurt performance a little but the biggest
gains are from the lockless allocation possible with the page allocator,
so the impact of this extra layer of indirection should be minimal.

In fact, this is exactly what we see:
    https://perf.golang.org/search?q=upload:20191125.5

This reduces the amount of mapped (PROT_NONE) memory needed on systems
with 48-bit address spaces to ~600 MiB down from almost 9 GiB. The bulk
of this remaining memory is used by the summaries.

Go processes with 32-bit address spaces now always commit to 128 KiB of
memory for the bitmap. Previously it would only commit the pages in the
bitmap which represented the range of addresses (lowest address to
highest address, even if there are unused regions in that range) used by
the heap.

Updates #35568.
Updates #35451.

Change-Id: I0ff10380156568642b80c366001eefd0a4e6c762
Reviewed-on: https://go-review.googlesource.com/c/go/+/207497
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
@4a6f656c
Copy link
Contributor

4a6f656c commented Dec 4, 2019

The login class will be the reason why the allocation worked when run as root, but not when run as a normal user. However, even when a user has datasize-max=infinity (and increases the data size soft limit), there is still an upper bound based on the amount of memory available to the machine - as such, on a laptop with 4GB of RAM, the ~8.5GB PROT_NONE allocation was still failing, even if run as a user with a login class of staff (or as root). It presumably only worked on the builders as root due to a significant amount of memory being available.

The same failure could also be triggered under Linux via ulimit -v (some code removed around 2017 had a comment noting that sysReserve could fail on 64-bit systems, either due to kernel enforced constraints or ulimit -v).

Obviously this was a pretty significant regression from Go 1.13 - I can confirm that with the sparse array change, I can once again build Go with a data size limit of 2GB (although there appears to have been another regression with the memory requirements for compiling cmd/compile/internal/ssa - 1.5GB is sufficient for Go 1.13.4) and run a Go binary with a data size limit of 768MB.

@ianlancetaylor
Copy link
Contributor Author

@mknyszek Is there anything else to do on this issue or do we think that it is fixed?

@mknyszek
Copy link
Contributor

mknyszek commented Dec 5, 2019

@4a6f656c It's unfortunate that virtual address space is limited in this way, but c'est l'vie. I'm glad the sparse array change helped.

We were also aware of ulimit -v failure mode on Linux for a while. The default is infinity because virtual address space is cheap (for example, I can make a 2 TiB PROT_NONE mapping without issue on Linux, with either default and strict overcommit rules set). Generally speaking ulimit -v is not the most accurate way to limit physical memory use (it's only per-process!), and cgroups do much better, so I don't think folks tend to use it much anymore.

@ianlancetaylor I don't think there's anything else we want to do now in terms of reducing PROT_NONE memory mapped. We tried a few things and they were either quite complicated or had other problems.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-OpenBSD release-blocker
Projects
None yet
Development

No branches or pull requests

7 participants