Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: schedule cgo compilation early #15681

Open
josharian opened this issue May 14, 2016 · 13 comments
Open

cmd/go: schedule cgo compilation early #15681

josharian opened this issue May 14, 2016 · 13 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. ToolSpeed
Milestone

Comments

@josharian
Copy link
Contributor

From a quick scan of the code, it appears that cgo does not depend on a package's dependencies having been compiled. Given that that is the case, and given that cgo (and the resulting C compiler invocations) are generally very slow, it is probably worth scheduling all invocations of cmd/cgo at the very beginning of a build. (As a corollary, this might also mean scheduling the compilation of cmd/cgo itself early.)

cc @ianlancetaylor for input about whether cgo invocations can be safely pushed to the head of the queue.

One downside: It is unclear whether this is just a poor man's version of #8893, which didn't appear to yield much fruit. That could use more investigation.

@josharian josharian added this to the Go1.8 milestone May 14, 2016
@ianlancetaylor
Copy link
Contributor

I agree that cgo (and SWIG) can be run immediately, without waiting for any Go files to be compiled.

Dmitry's CL for #8893, and your version of it, prove nothing about this one way or the other, as they do not break out the cgo portion of building a package from the rest of building a package. All the cgo support is wrapped up in the same function that compiles the package files: builder.build.

@petermattis
Copy link

We can also parallelize the C compiler invocations within a package which can give a very big compilation speedup for a cgo heavy project. See https://go-review.googlesource.com/#/c/4931/.

@josharian
Copy link
Contributor Author

Thanks, @petermattis, I was just trying to find that CL. If I do any serious surgery here, I will see about making individual C compilation fine-grained.

@mdempsky
Copy link
Member

As mentioned in #16623, I propose the opposite approach: instead of trying to push cmd/cgo earlier, let's push the C compilations later.

Currently the only reason we have to wait for the C sources to finish compiling is so we can run cmd/cgo -dynimport and generate a bunch of //go:cgo_import_dynamic directives. But cmd/compile doesn't actually need these directives: it simply stashes them into the compiled package artifact so cmd/link can find them.

If cmd/go was responsible for saving the directives instead, we could run cmd/compile immediately after the first cmd/cgo run. Then Go package compilation would never be blocked waiting on C compilations, and C compilations could all run in parallel only blocking any link operations that depend on them.

@josharian
Copy link
Contributor Author

let's push the C compilations later

C compilation is slow. Don't we want it to be as early as possible, so that it isn't the lone straggler at the end?

I agree that it'd be very good to not have to wait for cgo/C to be done to start compiling Go, I just want to make sure we don't push C to the end as a consequence of that.

@mdempsky
Copy link
Member

I don't mean C compilations need to be delayed per se, but they're trivially parallelizable and nothing fundamentally depends on them except for the linker. On the other hand, Go compilations do necessarily depend on other compilations, so it seems beneficial to prioritize scheduling them to unblock more work. E.g., in your graph for #15734, we have a long bottleneck for package runtime at the beginning, but it looks like we have more than enough idle CPUs later in the build to handle the C compilations.

My hypothesis is that if we're able to 1) remove the unnecessary dependency from Go compilations on C compilations, and 2) implement something like #15734; then the Go dependency graph's scheduling delays should essentially flatten, allowing us to naturally schedule C compilations earlier.

I suppose what would be really beneficial here is to collect fine-grained trace timing data, and then analyze how much more optimally it could have been scheduled if we relax various dependencies.

@josharian
Copy link
Contributor Author

I don't mean C compilations need to be delayed per se

Ack

Seems plausible. Definitely worth a run. The cmd/go tracing stuff is near the bottom of my list of pending CLs to get mailed/fixed/submitted, but I will get to them eventually. :) But I think the data about C and cgo is probably clear enough already that we can just move forward with your approach.

@quentinmit quentinmit added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 6, 2016
@rsc rsc modified the milestones: Unplanned, Go1.8 Oct 21, 2016
@pwaller
Copy link
Contributor

pwaller commented Apr 5, 2017

Pinging the thread while I wait here for 31 idle cores to complete a several minute CGo compilation... 🍅

@petermattis
Copy link

@pwaller If you're willing to rebuild your Go toolchain (which is really quite easy), see https://github.com/cockroachdb/cockroach/blob/master/build/parallelbuilds-go1.8.patch. The patch applies cleanly to go1.8 and we use it for development of CockroachDB.

@mdempsky
Copy link
Member

I played with this a little last night. In particular, I changed cgo -dynimport to directly write a dummy .o object file with embedded cgo directives (as opposed to writing out a .go file), then cmd/go just needs to append it to the .a archive. cmd/link automatically does the right thing.

The main limiting factor then is that cmd/go's build graph currently uses a single monolithic Action to represent package compilation, but ideally we'd separate cgo compilation into (at least) two Actions:

  1. Run cgo to generate .c and .go files, and run go tool compile -out pkg.a -linkobj pkg.la pkg/*.go. At this point, pkg.a (the compiler's export data output) is fully ready for downstream Go compilations.
  2. Run cmd/asm and gcc to compile all the other compilation units within the package, and append them to the .la file. Once those are all done, the .la file is ready for any dependant cmd/link operations.

The actions could probably be further broken down even further (e.g., parallelize the C compilations) or to even more finely-grain the dependencies (e.g., use gcc's -M flag to recognize when individual .o files need to be rebuilt), but that first split is probably the biggest win for most cgo users.

@elgatito
Copy link

@petermattis your link is not available. Do you have it in the history anywhere?

Also having to wait a lot till one core compiles big C++ library bindings.

@petermattis
Copy link

@elgatito https://github.com/cockroachdb/cockroach/blob/aa22f2140f0078ea9c6a43f29a87cf24471ea0e8/build/parallelbuilds-go1.8.patch. We no longer use this patch as we've moved away from using the go tool to compile C++, and now rely on cmake.

@gopherbot
Copy link

Change https://golang.org/cl/328712 mentions this issue: cmd/cgo: output cgo pragmas as object instead of go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. ToolSpeed
Projects
None yet
Development

No branches or pull requests

10 participants