Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: read and process import data lazily #20070

Closed
rsc opened this issue Apr 21, 2017 · 10 comments
Closed

cmd/compile: read and process import data lazily #20070

rsc opened this issue Apr 21, 2017 · 10 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge ToolSpeed
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Apr 21, 2017

As an experiment, I changed the compiler to report the number of packages, types, and functions built up as a result of processing imports for a given package:

$ git diff 
diff --git a/src/cmd/compile/internal/gc/bimport.go b/src/cmd/compile/internal/gc/bimport.go
index ea9c02dea8..4fee3eee41 100644
--- a/src/cmd/compile/internal/gc/bimport.go
+++ b/src/cmd/compile/internal/gc/bimport.go
@@ -15,6 +15,7 @@ import (
 	"encoding/binary"
 	"fmt"
 	"math/big"
+	"os"
 	"strconv"
 	"strings"
 )
@@ -220,6 +221,10 @@ func Import(imp *types.Pkg, in *bufio.Reader) {
 
 	p.verifyTypes()
 
+	if os.Getenv("IMPORTDEBUG") != "" {
+		fmt.Fprintf(os.Stderr, "import %v: %d pkg + %d type + %d func\n", imp.Path, len(p.pkgList), len(p.typList), len(p.funcList))
+	}
+
 	// --- end of export data ---
 
 	typecheckok = tcok
$ 

This program has no imports, so no overhead:

$ cat x.go
package main

func main() {
	println("hello world")
}
$ IMPORTDEBUG=1 go build x.go
$

This program imports fmt, which turns out to mean processing 38 types and 20 functions, only one of which is needed:

$ cat x.go
package main

import "fmt"

func main() {
	fmt.Println("hello world\n")
}
$ IMPORTDEBUG=1 go build x.go
# command-line-arguments
import fmt: 2 pkg + 38 type + 20 func
$

This program imports net/http, which turns out to mean processing 231 types and 857 functions, only one of which is needed:

$ cat x.go
package main

import "net/http"

func main() {
	http.ListenAndServe(":8080", nil)
}
$ IMPORTDEBUG=1 go build x.go
# command-line-arguments
import net/http: 29 pkg + 231 type + 857 func
$

The overhead here is more than 100X. Many real-world packages are even bigger than the ones in the standard library.

In a typical source file or package, especially during an incremental build, I suspect the compiler spends significantly more type loading and type-checking all these unnecessary imported definitions than it does the necessary one and the actual source file.

Now that we have a clean encapsulation of import/export format handling, it seems like the next step should be to make the import/export format indexed in some way, so that it can be consulted on demand during the compilation rather than loaded entirely up front. That is, when you import "fmt" you'd open the export data for fmt, and then when you get to fmt.Println you'd pull in just what you need to understand fmt.Println (in this case, likely just the function type).

Similarly, for the HTTP server, invoking http.ListenAndServe would mean pulling in the function type and possibly then also learning about http.Handler, since that is the type of the second argument. That might in turn require learning about some more types, since Handler's ServeHTTP takes an http.ResponseWriter and a *http.Request. The latter is a struct with many other types, and it might require loading some large number of other types. Ideally this could be cut off through more lazy loading (none of this is required to type check that the argument
nil is in fact a *http.Request). Even if not, though, the total number of types and functions
should still be smaller than before.

The problem will only get worse as packages grow. For protobuf-generated code, where the export data summarizes essentially the entirety of all imported packages recursively, not loading this information into the compiler until it is needed would be a big win.

Even in the main repo, there are indications this could be a big win. Consider a full build of cmd/go (just to take a decent-sized program) and its dependencies, compiled with inlining completely disabled:

$ for i in $(seq 5); do 9 time go build -o go.exe -gcflags -l=1 -a cmd/go; sleep 5; done
14.56u 2.70s 6.65r 	 go build -o go.exe -gcflags ...
14.69u 2.71s 6.70r 	 go build -o go.exe -gcflags ...
14.84u 2.74s 6.83r 	 go build -o go.exe -gcflags ...
14.91u 2.74s 6.88r 	 go build -o go.exe -gcflags ...
14.76u 2.72s 6.74r 	 go build -o go.exe -gcflags ...
$ 

Here's the same full build with our current default inlining:

$ for i in $(seq 5); do 9 time go build -o go.exe -gcflags -l=0 -a cmd/go; sleep 5; done
16.51u 2.80s 7.31r 	 go build -o go.exe -gcflags ...
16.79u 2.86s 7.27r 	 go build -o go.exe -gcflags ...
16.97u 2.88s 7.37r 	 go build -o go.exe -gcflags ...
17.03u 2.88s 7.41r 	 go build -o go.exe -gcflags ...
17.11u 2.86s 7.38r 	 go build -o go.exe -gcflags ...
$ 

And here's the same full build with the more aggressive mid-stack inlining:

$ for i in $(seq 5); do 9 time go build -o go.exe -gcflags -l=4 -a cmd/go; sleep 5; done
24.09u 3.26s 9.53r 	 go build -o go.exe -gcflags ...
24.08u 3.25s 9.28r 	 go build -o go.exe -gcflags ...
24.59u 3.30s 9.61r 	 go build -o go.exe -gcflags ...
24.39u 3.26s 9.52r 	 go build -o go.exe -gcflags ...
24.63u 3.28s 9.52r 	 go build -o go.exe -gcflags ...
$ 

The difference here is that more inlining means more function bodies that must be imported and type-checked. My guess is that, for any single package being compiled, nearly all of the function bodies included in export data are irrelevant (but which ones are irrelevant changes depending
on who is importing the package, of course). Under the assumption that on-demand processing of the function bodies would make -l=0 (inlining on) behave more like -l=1 (inlining off), that would reduce build times by about 10% (6.65s vs 7.27s). It would also keep the more aggressive inlining from increasing build times by 25% (9.28s vs 7.27s).

I'm assuming here that the increase in time is due to importing the new data and not writing it out. I haven't checked that, but the export data is typically read back many times but only written out once (or zero times) during a given build, so I think that's a reasonable assumption.

It's obviously too late for Go 1.9, but it seems like this would be an effective way to speed up the compiler in the Go 1.10 cycle.

/cc @aclements @griesemer @mdempsky

@rsc rsc added this to the Go1.10Early milestone Apr 21, 2017
@josharian
Copy link
Contributor

josharian commented Apr 21, 2017

@mdempsky and I were just discussing this as well. I noticed it with the cmd/compile/internal/ARCH packages, whose compilation time appears to be dominated by reading the import data for the ssa package, which has lots of export data. @mdempsky was going to look into whether we can reduce the size of that export data by making better decisions about which functions might potentially be inlined by other packages, which might (maybe) happen for 1.9, and which would help both writing and reading.

One other bit of anecdotal evidence that this matters came from @davecheney's observation that the biggest compile time speedup last cycle came from switching to the binary export format.

@griesemer
Copy link
Contributor

Nice observation. I'd argue that in a more realistic scenario, a single package probably uses more functionality from an imported package; and that same package may be imported multiple times by a larger package (and then the import happens only the very first time), so the overhead is present but perhaps not as extreme. But there's no denying that this doesn't scale well and needs to be addressed eventually.

Here are some approaches that come to mind:

  1. It should be relatively straight-forward to not read data for exported functions if they are not used. They are already in a separate section. The only thing we have to be careful about is to export types that are referred to from such functions (the original exported did that - the new format didn't have to because it could "inline" new type declarations). Such an approach should readily cut down the amount of data that needs to be read in and type-checked. Inlined function bodies produce a lot of nodes. (As an aside, ideally no type-checking should be needed for imported data - it was type-checked before. But that's a longer-term issue. And there's a tension between re-type-checking and exporting addition type information.)

  2. Creating a more general index requires a bit more thought.

  3. Longer-term: If we move to a model where installed packages are in some form of data base, export data could be in a separate data base which is suitable for quick lookup.

I suspect 1) could be done even for 1.9 if we permit say 1 or 2 weeks extra past the official freeze for this.

@josharian
Copy link
Contributor

Related: #15734 and #15752.

@josharian
Copy link
Contributor

The difference here is that more inlining means more function bodies that must be imported and type-checked.

True, but the importing and type-checking is not be the whole story. Inlining also means that functions get larger, which means more nodes, more time in the backend, more progs, etc. I'd wager that those effects are also non-trivial.

I still think this is a good idea, I'm just not sure that the inlining data is clear evidence for it, or that it will necessarily make mid-stack inlining much cheaper, compilation-wise.

@griesemer
Copy link
Contributor

One more thing: I suspect it's a common scenario that a package A, which imports a package C, partially exports pieces of C in the export data of A. Another package B may do the same, partially exporting a part of C, maybe the same as A did, maybe a different part.

A client of A and B now needs to import multiple parts of C, possibly different ones, possibly overlapping ones. Across a sufficiently large program it could be much more efficient to import C in full exactly once.

@dr2chase
Copy link
Contributor

dr2chase commented Apr 24, 2017

Could this finer-grained importing be turned around to create finer-grained dependences? I worked on a Brand X The-Java-Programming-Language™ bytecode-to-native static compiler that did this down to the field-offset level, and it was nice. Obviously we can end up spending more time on dependence paperwork than redoing the work, but perhaps there's a sweet spot at a finer granularity than current.

@griesemer
Copy link
Contributor

@dr2chase There's a separate issue for that #15752.

@bradfitz bradfitz added early-in-cycle A change that should be done early in the 3 month dev cycle. and removed early-in-cycle A change that should be done early in the 3 month dev cycle. labels Jun 14, 2017
@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.10 Jun 14, 2017
@griesemer
Copy link
Contributor

Ongoing work. For 1.11.

@griesemer griesemer modified the milestones: Go1.10, Go1.11 Nov 29, 2017
@josharian
Copy link
Contributor

@mdempsky ready to call this fixed?

@mdempsky
Copy link
Member

mdempsky commented May 1, 2018

Yeah, I think this is substantially addressed by ca2f85f.

There's still more work that can be done to make type loading lazy. I think we can track that in #25056.

@mdempsky mdempsky closed this as completed May 1, 2018
@golang golang locked and limited conversation to collaborators May 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge ToolSpeed
Projects
None yet
Development

No branches or pull requests

7 participants