Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: support larger _MaxGomaxprocs #15131

Closed
leitao opened this issue Apr 5, 2016 · 24 comments
Closed

runtime: support larger _MaxGomaxprocs #15131

leitao opened this issue Apr 5, 2016 · 24 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@leitao
Copy link

leitao commented Apr 5, 2016

1. What version of Go are you using (go version)?
1.6 and Upstream release

2. What operating system and processor architecture are you using (go env)?
Ubuntu 16.04 Final Beta:

GOARCH="ppc64le"
GOBIN=""
GOEXE=""
GOHOSTARCH="ppc64le"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/usr/lib/go-1.6"
GOTOOLDIR="/usr/lib/go-1.6/pkg/tool/linux_ppc64le"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"

3. What did you do?
Run docker command on my environment.

4. What did you expect to see?
Docker running without a stack trace.

5. What did you see instead?
An error and unexpected exit, as:

procresize: invalid arg

@leitao
Copy link
Author

leitao commented Apr 5, 2016

I am going to propose the following patch:

    Author: Breno Leitao <breno.leitao@gmail.com>
    Date:   Mon Apr 4 16:09:55 2016 -0400

    Increase _MaxGomaxprocs to support big machines

    Currently go does not support machines that contains > 256 CPUs, as IBM's E880
    that could host 1536 hardware thread, when configured with 192 CPU cores and
    SMT (Symmetric Multi Thread) 8.

    For example, when running a go program on this machine, I got, the following
    problem[1]:

      procresize: invalid arg

    This is because of the following code:

      _MaxGomaxprocs = 1 << 8

      if old < 0 || old > _MaxGomaxprocs || nprocs <= 0 || nprocs > _MaxGomaxprocs {
          throw("procresize: invalid arg")
      }

    This patch just redefine _MAxGomaxprocs to 1 << 12.

diff --git a/src/runtime/runtime2.go b/src/runtime/runtime2.go
index e0137f7..508cecb 100644
--- a/src/runtime/runtime2.go
+++ b/src/runtime/runtime2.go
@@ -472,7 +472,7 @@ type p struct {
 const (
    // The max value of GOMAXPROCS.
    // There are no fundamental restrictions on the value.
-   _MaxGomaxprocs = 1 << 8
+   _MaxGomaxprocs = 1 << 12
 )

 type schedt struct {

@martisch
Copy link
Contributor

martisch commented Apr 5, 2016

While at it can we do 8192? SGI UV 3000 supports 8192 Threads with Intel CPU's.

@bradfitz
Copy link
Contributor

bradfitz commented Apr 5, 2016

Btw, this would increase the size of two tables which are in all binaries.

Currently:

$ go tool nm --size hello1 | sort -k2 -n | grep -E 'allp|pdesc'
   8a7c0       2080 B runtime.allp
   a65c0       8192 B runtime.pdesc

At 1<<12, it would go up 16x, to 160 KB. That's quite an increase. 1<<13 would be 320KB.

That's getting kinda large. @crawshaw has been trying to make binaries smaller.

We might have to switch to allocating those tables dynamically.

@bradfitz bradfitz changed the title Increase _MaxGomaxprocs to support big machines runtime: support larger _MaxGomaxprocs Apr 5, 2016
@bradfitz bradfitz added this to the Go1.7 milestone Apr 5, 2016
@minux
Copy link
Member

minux commented Apr 5, 2016 via email

@bradfitz
Copy link
Contributor

bradfitz commented Apr 5, 2016

Those tables are zero initialized, so they won't take space in binaries,

Ah, indeed. And confirmed.

@martisch
Copy link
Contributor

martisch commented Apr 5, 2016

I agree there is no need to make 99.999% of the system suffer for a few big iron systems if the change increases memory footprint.

go1.4 seems to work fine on the SGI UV 2000 i have access too.
Its an easy fix so one can just change the constant before compilation for tip.

Looks like allp is a static array so dynamically sizing it as slice might still have some performance implications.

I quickly tested the change and go tip works now but i got one error running tests that i guess is just due to the server using an older suse linux enterprise/gcc version and not related:

/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: /tmp/ccBcRBK4.o: relocation R_X86_64_32 against `.text' can not be used when making a shared object; recompile with -fPIC
/tmp/ccBcRBK4.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
No support for -pie found, skip cgo PIE test.

@mwhudson
Copy link
Contributor

mwhudson commented Apr 5, 2016

There is another issue here, which is that Go programs crash on startup on systems where ncpu() returns > _MaxGomaxprocs. That seems like it should be an easy fix for 1.6.1? (By just clamping the value)

@leitao
Copy link
Author

leitao commented Apr 6, 2016

Right. I understand, that, if the machine has more than _MaxGomaxprocs, the application should still be able to run, but not exploiting all the CPUs, just _MaxGomaxprocs.

As today, the application crashes.

gopherbot pushed a commit that referenced this issue Apr 7, 2016
So that all Go processes do not die on startup on a system with >256 CPUs.

I tested this by hacking osinit to set ncpu to 1000.

Updates #15131

Change-Id: I52e061a0de97be41d684dd8b748fa9087d6f1aef
Reviewed-on: https://go-review.googlesource.com/21599
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@mwhudson
Copy link
Contributor

mwhudson commented Apr 7, 2016

I'd like to propose the fix I just landed for 1.6.X. Should I file a separate issue just for the clamping fix and leave this one for how to handle a larger ncpu more gracefully?

@gopherbot
Copy link

CL https://golang.org/cl/21630 mentions this issue.

@bradfitz
Copy link
Contributor

bradfitz commented Apr 7, 2016

@mwhudson, Go 1.6.x are generally only for things with no workarounds. There is a workaround for this, though, right: the user can set GOMAXPROCS=N explicitly in their environment?

@mwhudson
Copy link
Contributor

mwhudson commented Apr 7, 2016

I ... guess so? Seems pretty hostile to a random user who just wants to run docker though.

(I'll pull this into Go in 16.04 anyway I suspect so I don't really care but...)

@bradfitz
Copy link
Contributor

bradfitz commented Apr 7, 2016

That's a way to look at it: there's nothing that Docker can do to work around it to give their users a working binary. So maybe it is fair game. It definitely seems low risk.

@davecheney
Copy link
Contributor

On Thu, Apr 7, 2016 at 12:42 PM, Michael Hudson-Doyle <
notifications@github.com> wrote:

I ... guess so? Seems pretty hostile to a random user who just wants to
run docker though.

It's not every random user, only the subset of those who have been given
access to an LPAR with more than 256 logical processors.

(I'll pull this into Go in 16.04 anyway I suspect so I don't really care
but...)


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#15131 (comment)

@binarycrusader
Copy link
Contributor

This will be a problem for the SPARC port too; some systems have up to 512 cores / 4,096 strands.

@leitao
Copy link
Author

leitao commented Apr 18, 2016

Any feedback on this problem?
Should we increase the _MaxGomaxprocs even more, or, do you think a more complex patch is required?

@minux
Copy link
Member

minux commented Apr 18, 2016 via email

@gopherbot
Copy link

CL https://golang.org/cl/22206 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 19, 2016
So that all Go processes do not die on startup on a system with >256 CPUs.

I tested this by hacking osinit to set ncpu to 1000.

Updates #15131
Fixes #15160

Change-Id: I52e061a0de97be41d684dd8b748fa9087d6f1aef
Reviewed-on: https://go-review.googlesource.com/21599
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-on: https://go-review.googlesource.com/22206
@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016
@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 11, 2016
@rsc rsc modified the milestones: Go1.9, Go1.8 Nov 11, 2016
@gopherbot
Copy link

CL https://golang.org/cl/45575 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/45572 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/45573 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/45574 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/45673 mentions this issue.

gopherbot pushed a commit that referenced this issue Jun 13, 2017
Currently MaxGomaxprocs is 256. The previous CL saved enough per-P
static space that we can quadruple MaxGomaxprocs (and hence the static
size of allp) and still come out ahead.

This is safe for Go 1.9. In Go 1.10 we'll eliminate the hard-coded
limit entirely.

Updates #15131.

Change-Id: I919ea821c1ce64c27812541dccd7cd7db4122d16
Reviewed-on: https://go-review.googlesource.com/45673
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Jun 14, 2017
There are currently two arrays indexed by P ID: allp and pdesc.
Consolidate these by moving the pdesc fields into type p so they can
be indexed off allp along with all other per-P state.

For #15131.

Change-Id: Ib6c4e6e7612281a1171ba4a0d62e52fd59e960b4
Reviewed-on: https://go-review.googlesource.com/45572
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
@aclements
Copy link
Member

Increased the limit to 1024 for Go 1.9 (without increasing the memory footprint). The remaining queued CLs will eliminate the limit for Go 1.10.

@aclements aclements modified the milestones: Go1.10Early, Go1.9 Jun 14, 2017
@aclements aclements self-assigned this Jun 14, 2017
@bradfitz bradfitz added release-blocker early-in-cycle A change that should be done early in the 3 month dev cycle. and removed early-in-cycle A change that should be done early in the 3 month dev cycle. labels Jun 14, 2017
@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.10 Jun 14, 2017
gopherbot pushed a commit that referenced this issue Sep 27, 2017
This makes it possible to eliminate the hard cap on GOMAXPROCS.

Updates #15131.

Change-Id: I4c422b340791621584c118a6be1b38e8a44f8b70
Reviewed-on: https://go-review.googlesource.com/45573
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
gopherbot pushed a commit that referenced this issue Sep 27, 2017
allp now has length gomaxprocs, which means none of allp[i] are nil or
in state _Pdead. This lets replace several different styles of loops
over allp with normal range loops.

for i := 0; i < gomaxprocs; i++ { ... } loops can simply range over
allp. Likewise, range loops over allp[:gomaxprocs] can just range over
allp.

Loops that check for p == nil || p.state == _Pdead don't need to check
this any more.

Loops that check for p == nil don't have to check this *if* dead Ps
don't affect them. I checked that all such loops are, in fact,
unaffected by dead Ps. One loop was potentially affected, which this
fixes by zeroing p.gcAssistTime in procresize.

Updates #15131.

Change-Id: Ifa1c2a86ed59892eca0610360a75bb613bc6dcee
Reviewed-on: https://go-review.googlesource.com/45575
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@golang golang locked and limited conversation to collaborators Sep 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests