Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go,cmd/compile: let cmd/compile choose how much concurrency to use #48497

Open
bcmills opened this issue Sep 20, 2021 · 13 comments
Open

cmd/go,cmd/compile: let cmd/compile choose how much concurrency to use #48497

bcmills opened this issue Sep 20, 2021 · 13 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. ToolSpeed
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Sep 20, 2021

In reviewing CL 351049, my attention was drawn to a complicated function that cmd/go uses to decide whether to request cmd/compile to use internal concurrency:

// gcBackendConcurrency returns the backend compiler concurrency level for a package compilation.
func gcBackendConcurrency(gcflags []string) int {
// First, check whether we can use -c at all for this compilation.
canDashC := concurrentGCBackendCompilationEnabledByDefault
switch e := os.Getenv("GO19CONCURRENTCOMPILATION"); e {
case "0":
canDashC = false
case "1":
canDashC = true
case "":
// Not set. Use default.
default:
log.Fatalf("GO19CONCURRENTCOMPILATION must be 0, 1, or unset, got %q", e)
}
CheckFlags:
for _, flag := range gcflags {
// Concurrent compilation is presumed incompatible with any gcflags,
// except for known commonly used flags.
// If the user knows better, they can manually add their own -c to the gcflags.
switch flag {
case "-N", "-l", "-S", "-B", "-C", "-I":
// OK
default:
canDashC = false
break CheckFlags
}
}
// TODO: Test and delete these conditions.
if buildcfg.Experiment.FieldTrack || buildcfg.Experiment.PreemptibleLoops {
canDashC = false
}
if !canDashC {
return 1
}
// Decide how many concurrent backend compilations to allow.
//
// If we allow too many, in theory we might end up with p concurrent processes,
// each with c concurrent backend compiles, all fighting over the same resources.
// However, in practice, that seems not to happen too much.
// Most build graphs are surprisingly serial, so p==1 for much of the build.
// Furthermore, concurrent backend compilation is only enabled for a part
// of the overall compiler execution, so c==1 for much of the build.
// So don't worry too much about that interaction for now.
//
// However, in practice, setting c above 4 tends not to help very much.
// See the analysis in CL 41192.
//
// TODO(josharian): attempt to detect whether this particular compilation
// is likely to be a bottleneck, e.g. when:
// - it has no successor packages to compile (usually package main)
// - all paths through the build graph pass through it
// - critical path scheduling says it is high priority
// and in such a case, set c to runtime.GOMAXPROCS(0).
// By default this is the same as runtime.NumCPU.
// We do this now when p==1.
// To limit parallelism, set GOMAXPROCS below numCPU; this may be useful
// on a low-memory builder, or if a deterministic build order is required.
c := runtime.GOMAXPROCS(0)
if cfg.BuildP == 1 {
// No process parallelism, do not cap compiler parallelism.
return c
}
// Some process parallelism. Set c to min(4, maxprocs).
if c > 4 {
c = 4
}
return c
}

I asked the compiler team about it, and @rsc pointed out a parallel function in cmd/compile, which ultimately terminates the process if the flags are not compatible:

// concurrentFlagOk reports whether the current compiler flags
// are compatible with concurrent compilation.
func concurrentFlagOk() bool {
// TODO(rsc): Many of these are fine. Remove them.
return Flag.Percent == 0 &&
Flag.E == 0 &&
Flag.K == 0 &&
Flag.L == 0 &&
Flag.LowerH == 0 &&
Flag.LowerJ == 0 &&
Flag.LowerM == 0 &&
Flag.LowerR == 0
}
func concurrentBackendAllowed() bool {
if !concurrentFlagOk() {
return false
}
// Debug.S by itself is ok, because all printing occurs
// while writing the object file, and that is non-concurrent.
// Adding Debug_vlog, however, causes Debug.S to also print
// while flushing the plist, which happens concurrently.
if Ctxt.Debugvlog || Debug.Any() || Flag.Live > 0 {
return false
}
// TODO: Test and delete this condition.
if buildcfg.Experiment.FieldTrack {
return false
}
// TODO: fix races and enable the following flags
if Ctxt.Flag_shared || Ctxt.Flag_dynlink || Flag.Race {
return false
}
return true
}

Those two functions implement slightly different logic for when concurrent compilation is allowed, and I expect that they will only tend to drift more over time. Moreover, having the compiler reject values of -N greater than 0 when certain flags are in use makes it harder for users to override cmd/gos current hard-coded default for the concurrency limit (which may be a factor in #17751).

The extra complexity in cmd/go also leads to subtle bugs like #48490.

It's obviously reasonable for the compiler to use internal synchronization to avoid races and to produce deterministic output, and at the limit “dropping the concurrency to 1” is equivalent to “using internal synchronization”. So, I think:

  • cmd/go should avoid setting the -c flag explicitly at all, and instead let cmd/compile choose its own reasonable defaults.

  • If cmd/compile detects that other gcflags are incompatible with concurrent compilation, it should drop to sequential compilation on its own. (That is, the -c flag should specify an upper limit, not an exact one.) That way, users can set GOFLAGS=-gcflags=-c=N and not worry about what other flags they might end up needing to pass.

  • Ideally cmd/compile should choose defaults based on GOMAXPROCS rather than a hard-coded constant, since not all packages have uniform shape (especially machine-generated API bindings!) and not all users' workstations have the same number of hardware threads.

CC @josharian @randall77 @mvdan

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. ToolSpeed labels Sep 20, 2021
@bcmills bcmills added this to the Backlog milestone Sep 20, 2021
@cuonglm
Copy link
Member

cuonglm commented Sep 20, 2021

The extra complexity in cmd/go also leads to subtle bugs like #48490.

I wonder whether this is also a bug #48496

@mdempsky
Copy link
Member

Ideally cmd/compile should choose defaults based on GOMAXPROCS rather than a hard-coded constant, since not all packages have uniform shape (especially machine-generated API bindings!) and not all users' workstations have the same number of hardware threads.

But cmd/go also runs multiple cmd/compile processes concurrently, and we don't want them stepping on each others' toes, right? Is there a good way to have them cooperate?

Or is the idea that cmd/go would set GOMAXPROCS for each subprocess based on what share of CPUs it should utilize?

@bcmills
Copy link
Contributor Author

bcmills commented Sep 20, 2021

But cmd/go also runs multiple cmd/compile processes concurrently, and we don't want them stepping on each others' toes, right?

Maybe? But that's arguably kind of the kernel's job anyway — why are we doing its job for it? 😅

Is there a good way to have them cooperate?

Not sure. In theory we could have some kind of cross-process semaphore, but in practice... that's what the kernel's scheduler is supposed to be doing.

I would expect that ultimately the limiting factor on compilation concurrency would be RAM for parse trees and other per-file data structures, which really don't depend on the number of active threads — but perhaps my expectations are off somewhere.

The conclusion in CL 41192 was:

Given the increased user-CPU costs as
c increases, it looks like c=4 is probably
the sweet spot, at least for now.

However, as far as I can tell the analysis there mostly shows that increasing concurrency wouldn't help for the packages in std and cmd — I don't see anything in those charts that quantifies how much it would actually cost to go higher. (Presumably the “increased user-CPU costs” to which @josharian referred are mainly the costs of creating, scheduling, and destroying the additional threads, but with modern thread implementations those costs should be fairly low to begin with.)

@josharian
Copy link
Contributor

A few minor observations.

  • We can simplify that complicated function a little by removing the GO19 env var. Generally speaking, there was a lot of nervousness about unleashing a concurrent compiler on the world, so this is all written more defensively and conservatively than is necessary so.
  • As a compiler developer, it's really nice for a plain invocation of go tool compile to have -c=1, because it means that I can add debugging code freely without having to think about concurrency.
  • It has been a long time since I researched this, so my memories are vague. But I recall trying lots of different values of c over many different packages on a 96 core machine and found almost no benefit to values of c over 4. However, that was a long time ago, and newer hardware may have changed this.
  • Adding concurrency adds not just scheduler overhead but also baseline memory usage, as each SSA worker gets it own fairly large cache.
  • Ideally, someone with access to a variety of large machines would do a bit of benchmarking and let that data inform decisions.

Maybe? But that's arguably kind of the kernel's job anyway — why are we doing its job for it? 😅

Because not all of them do their job well, and it's easy for us to control concurrent access. Same answer as for why a lot of tools have semaphores guarding I/O and other syscalls.

If cmd/compile detects that other gcflags are incompatible with concurrent compilation, it should drop to sequential compilation on its own.

This seems eminently sensible, and would provide a lot of simplification. +1

@mdempsky
Copy link
Member

Maybe? But that's arguably kind of the kernel's job anyway — why are we doing its job for it? sweat_smile

My impression was that you get the best throughput when the number worker threads is close to the number of CPU cores, and increasing the number of workers past that leads to cache thrashing. E.g., if we already have N cmd/compile processes running across N cores, it doesn't help for the cmd/compile processes to internally switch back and forth between multiple functions; it just reduces memory locality.

@bcmills
Copy link
Contributor Author

bcmills commented Sep 21, 2021

My impression was that you get the best throughput when the number worker threads is close to the number of CPU cores, and increasing the number of workers past that leads to cache thrashing.

That matches my intuition, but if the kernel's time-slice granularity is big enough then it shouldn't matter much either way. (For comparison, Go's garbage collector also thrashes the cache, but GCs are infrequent enough that it's not that big a deal.)

@bcmills
Copy link
Contributor Author

bcmills commented Sep 21, 2021

As a compiler developer, it's really nice for a plain invocation of go tool compile to have -c=1, because it means that I can add debugging code freely without having to think about concurrency.

Maybe? But if we move the choice to the cmd/compile side, perhaps you could instead set something in your environment (like GO19CONCURRENTCOMPILATION=0 or GODEBUG=concurrentcompile=0) to change the default for go tool compile.

(If you invoke the compiler through go/build, you can already set GOFLAGS=-gcflags=-c=1.)

@mvdan
Copy link
Member

mvdan commented Sep 21, 2021

Something I wonder is the "critical path". If the toolchain is able to build 10 packages in parallel, because we have enough CPUs at hand, but one of those packages should be given priority because it is imported by another dozen of packages we also need to build - will it actually be given more CPU time earlier than the other 9?

@josharian
Copy link
Contributor

Maybe? But if we move the choice to the cmd/compile side, ...

This is something I feel strongly about. I don't want to have to remember to set up a special env var on new machines, and I don't want to have to explain to new contributors about it, or debug what happens when they haven't. Plain go tool compile is currently a fairly friendly tool, and I'd like to keep it that way.

@josharian
Copy link
Contributor

@mvdan we don't currently have any critical path analysis, but that's a good point—cmd/go is the tool that sees the big picture, and is thus better placed to make resource allocation decisions.

@kunaltyagi
Copy link

kunaltyagi commented Apr 16, 2022

EDIT: Not relevant to discussion

Would there be a GNU make jobserver compatible implementation? That would allow polyglot code-bases to be compiled without stepping on many toes. This can be an advanced option, not required for 80% of usecases.

Currently, the tooling for C/C++ and rust already supports it, but golang doesn't, making the build scripts slightly sequential (not a big deal right now). I assume this would also open doors for some very neat integration with on-prem CI (eg: high utilization without page thrashing due to context switches when total processes > number of cores (difficult to control right now))

PS: am I going about it in the wrong thread? I think this is tangentially related to the topic at hand

@ianlancetaylor
Copy link
Contributor

@kunaltyagi This discussion is about the Go tool invoking the compiler, so, yes, I think the idea of adding a GNU make jobserver implementation should be in a different issue. Thanks.

@kunaltyagi
Copy link

Got it, thanks.

I've created a separate issue, and hidden the previous comment to prevent derailing of further conversation

#52387

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. ToolSpeed
Projects
Status: Triage Backlog
Development

No branches or pull requests

8 participants