Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/packages: Load is 2-3 orders of magnitude slower than go/build.Import #31087

Closed
thockin opened this issue Mar 27, 2019 · 22 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance Tools This label describes issues relating to any tools in the x/tools repository. ToolSpeed
Milestone

Comments

@thockin
Copy link

thockin commented Mar 27, 2019

As per @heschik in #29452

Converting a tool from go/build to go/packages incurs a huge slowdown. I tried to min-repro it here. Note that we have a full vendor/ dir, no un-vendored deps, and modules are NOT turned on yet.

What version of Go are you using (go version)?

1.12.1

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

linux, amd64

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/usr/local/google/home/thockin/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/usr/local/google/home/thockin/src/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/google-golang"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/google-golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build445939239=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Build and run the two programs below and time them.

What did you expect to see?

Roughly the same timing

What did you see instead?

0.008s vs 1.272s

go-build.go:

package main

import (
        "fmt"
        "go/build"
        "log"
        "os"

        "github.com/davecgh/go-spew/spew"
)

func main() {
        if err := visit("."); err != nil {
                fmt.Println(err)
                os.Exit(1)
        }
}

func visit(pkgName string) error {
        pkg, err := findPackage(pkgName)
        if err != nil {
                return err
        }
        spew := spew.ConfigState{DisableMethods: true, Indent: "  "}
        spew.Dump(pkg)
        return nil
}

func findPackage(pkgName string) (*build.Package, error) {
        log.Println("find", pkgName)
        pkg, err := build.Import(pkgName, getwd(), build.ImportComment)
        if err != nil {
                return nil, err
        }
        return pkg, nil
}

func getwd() string {
        pwd, err := os.Getwd()
        if err != nil {
                panic(fmt.Sprintf("can't get working directory: %v", err))
        }
        return pwd
}

go-packages.go:

package main

import (
	"fmt"
	"log"
	"os"

	"github.com/davecgh/go-spew/spew"

	"golang.org/x/tools/go/packages"
)

func main() {
	if err := visit("."); err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
}

func visit(pkgName string) error {
	pkgs, err := findPackage(pkgName)
	if err != nil {
		return err
	}
	spew := spew.ConfigState{DisableMethods: true, Indent: "  "}
	spew.Dump(pkgs[0])
	return nil
}

func findPackage(pkgName string) ([]*packages.Package, error) {
	log.Println("find", pkgName)
	pkgs, err := packages.Load(&packages.Config{Mode: packages.LoadFiles}, pkgName)
	if err != nil {
		return nil, err
	}
	return pkgs, nil
}

Testing:

$ ls
go-build.go  go-packages.go

$ go build ./go-build.go 

$ go build ./go-packages.go 

$ time ./go-build >/dev/null
2019/03/27 11:51:10 find .

real	0m0.008s
user	0m0.008s
sys	0m0.000s

$ time ./go-packages >/dev/null
2019/03/27 11:51:17 find .

real	0m1.272s
user	0m0.094s
sys	0m0.109s
@thockin thockin changed the title go/packes is 50x slower than go/build go/packages is 50x slower than go/build Mar 27, 2019
@thockin thockin changed the title go/packages is 50x slower than go/build go/packages is 2-3 orders of magnitude slower than go/build Mar 27, 2019
@matloob
Copy link
Contributor

matloob commented Mar 27, 2019

This was discussed offline: The slowness was due to the Google-specific gopackagesdriver binary.

Unfortunately go/packages will always be slower than go/build because we need to execute go list as the true authority of package information, but this kind of slowness is unexpected and really bad. We'll get a google-specific fix in soon, but for those of you facing this problem, you can set GOPACKAGESDRIVER=off in the short term to bypass that driver.

@thockin
Copy link
Author

thockin commented Mar 27, 2019

Thanks. I am still concerned at "go/packages will always be slower than go/build" even when we have vendored EVERYTHING. I will hang tight for the followup here, and try a proper rewrite of my real tool against go/packages (rather than a trivial adaptation as I did)

@heschi
Copy link
Contributor

heschi commented Mar 27, 2019

re: always slower, it's true but I doubt it'll be a big problem for Kubernetes' build tools.

There's some amount of overhead involved in running go list, then marshalling and unmarshalling the results. In my experience that's usually small, say 50-100ms or so. That can be a problem for highly interactive tools like goimports and godef, but for a build tool like go2make I think it'd be okay.

After that the major issue is that packages.Load starts from scratch every time, so it's important to keep the number of calls low to avoid loading the same packages (e.g. the stdlib) over and over and over. That can be really easy, or require major refactorings. But for most people I don't think it's been a big problem.

@thockin
Copy link
Author

thockin commented Mar 28, 2019 via email

@josharian
Copy link
Contributor

Note that issues like #30826 and #30828 make it harder to avoid calling packages.Load multiple times. go-fuzz-build calls packages.Load three times, and properly dealing with the referenced issues (which I am currently hacking around in a way that will work until it doesn't) will require a fourth and possibly a fifth call.

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Apr 12, 2019
@bcmills bcmills added this to the Unreleased milestone Apr 12, 2019
@bcmills bcmills changed the title go/packages is 2-3 orders of magnitude slower than go/build go/packages: Load is 2-3 orders of magnitude slower than go/build.Import Apr 15, 2019
@gopherbot
Copy link

Change https://golang.org/cl/170863 mentions this issue: cmd/gopherbot: CC triaged issues to owners

@dmitshur dmitshur changed the title go/packages: Load is 2-3 orders of magnitude slower than go/build.Import x/tools/go/packages: Load is 2-3 orders of magnitude slower than go/build.Import Apr 16, 2019
@gopherbot gopherbot added the Tools This label describes issues relating to any tools in the x/tools repository. label Sep 12, 2019
@thockin
Copy link
Author

thockin commented Dec 22, 2020

I'm baaaaack!

This bug started with me trying to convert from go/build to go/packages. Trivial conversion was not going to work, so I back-burnered it. But now even go/build is slow (not sure when it happened). I noticed that we still set GO111MODULE=off in some paths and thought I would nix that. Uh oh.

The below program runs very fast with GO111MODULE=off and very slow with it on. Same binary, same inputs, 100x slower.

$ time GO111MODULE=off /tmp/m/k/slow -d ./pkg/proxy/iptables/ | wc -l
3357

real	0m0.286s
user	0m0.166s
sys	0m0.112s

$ time GO111MODULE=on /tmp/m/k/slow -d ./pkg/proxy/iptables/ | wc -l
3357

real	0m37.591s
user	0m37.205s
sys	0m11.689s

From before:

re: always slower, it's true but I doubt it'll be a big problem for Kubernetes' build tools.

It looks like there's an internal switch that happened at some point and without GO111MODULE=off, it uses the slow path. But HOLY COW is that slow.

The problem is that this tool runs on every Kubernetes build. In the legacy mode it takes 3-5 seconds for the whole repo, which is acceptable. Adding 5+ minutes to each run (even just a small incremental run) is obviously not viable.

So I guess the question I have is - why is this OK? Am I just so far outside the norm that 100x slowdown is acceptable in exchange for some (presumably) internal cleanup?

Can I rely on GO111MODULE=off to be fast forever? Or should I plan to totally overhaul how Kubernetes builds work?

Is there a better answer?

Program:

package main

import (
	"flag"
	"fmt"
	"go/build"
	"os"
	"path"
	"sort"
)

var flDebug = flag.Bool("d", false, "enable debugging output")

func main() {
	flag.Parse()

	debug("PWD", getwd())

	build.Default.BuildTags = []string{"ignore_autogenerated"}
	build.Default.UseAllFiles = false

	if len(flag.Args()) == 0 {
		os.Exit(1)
	}
	// Recurse.
	pkgName := flag.Args()[0]
	debug("starting", pkgName)
	if err := WalkPkg(pkgName, visitPkg); err != nil {
		fmt.Fprintln(os.Stderr, err)
		os.Exit(1)
	}
}

func debug(items ...interface{}) {
	if *flDebug {
		x := []interface{}{"DBG:"}
		x = append(x, items...)
		fmt.Println(x...)
	}
}

func visitPkg(importPath, absPath string) error {
	debug("visit", importPath)
	return saveImport(importPath)
}

// cache keeps track of which packages we have already loaded.
var cache = map[string]*build.Package{}

func saveImport(pkgName string) error {
	if cache[pkgName] != nil {
		return nil
	}
	pkg, err := loadPackage(pkgName)
	if err != nil {
		return err
	}
	debug("save", pkgName)
	cache[pkgName] = pkg

	debug("recurse", pkgName)
	defer func() { debug("done   ", pkgName) }()
	if !pkg.Goroot && (len(pkg.GoFiles)+len(pkg.Imports) > 0) {
		// Process deps of this package before the package itself.
		for _, impName := range pkg.Imports {
			if impName == "C" {
				continue
			}
			debug("depends on", impName)
			saveImport(impName)
		}
	}
	return nil
}

func loadPackage(pkgName string) (*build.Package, error) {
	debug("load", pkgName)
	pkg, err := build.Import(pkgName, getwd(), 0)
	if err != nil {
		// We can ignore NoGoError.  Anything else is real.
		if _, ok := err.(*build.NoGoError); !ok {
			return nil, err
		}
	}
	return pkg, nil
}

func getwd() string {
	pwd, err := os.Getwd()
	if err != nil {
		panic(fmt.Sprintf("can't get working directory: %v", err))
	}
	return pwd
}

// VisitFunc is a function called by WalkPkg to examine a single package.
type VisitFunc func(importPath string, absPath string) error

// WalkPkg recursively visits all packages under pkgName.  This is similar
// to filepath.Walk, except that it follows symlinks.  A package is always
// visited before the children of that package.
func WalkPkg(pkgName string, visit VisitFunc) error {
	// Visit the package itself.
	pkg, err := findPackage(pkgName)
	if err != nil {
		return err
	}
	if err := visit(pkg.ImportPath, pkg.Dir); err != nil {
		return err
	}

	// Read all of the child dirents and find sub-packages.
	infos, err := readDirInfos(pkg.Dir)
	if err != nil {
		return err
	}
	for _, info := range infos {
		if !info.IsDir() {
			continue
		}
		name := info.Name()
		if name[0] == '_' || (len(name) > 1 && name[0] == '.') || name == "testdata" {
			continue
		}
		// Don't use path.Join() because it drops leading `./` via path.Clean().
		err := WalkPkg(pkgName+"/"+name, visit)
		if err != nil {
			return err
		}
	}
	return nil
}

// findPackage finds a Go package.
func findPackage(pkgName string) (*build.Package, error) {
	debug("find", pkgName)
	pkg, err := build.Import(pkgName, getwd(), build.FindOnly)
	if err != nil {
		return nil, err
	}
	return pkg, nil
}

// readDirInfos returns a list of os.FileInfo structures for the dirents under
// dirPath.  The result list is sorted by name.  This is very similar to
// ioutil.ReadDir, except that it follows symlinks.
func readDirInfos(dirPath string) ([]os.FileInfo, error) {
	names, err := readDirNames(dirPath)
	if err != nil {
		return nil, err
	}
	sort.Strings(names)

	infos := make([]os.FileInfo, 0, len(names))
	for _, n := range names {
		info, err := os.Stat(path.Join(dirPath, n))
		if err != nil {
			return nil, err
		}
		infos = append(infos, info)
	}
	return infos, nil
}

// readDirNames returns a list of all dirents in dirPath.  The result list is
// not sorted or filtered.
func readDirNames(dirPath string) ([]string, error) {
	d, err := os.Open(dirPath)
	if err != nil {
		return nil, err
	}
	defer d.Close()

	names, err := d.Readdirnames(-1)
	if err != nil {
		return nil, err
	}
	return names, nil
}

@mvdan
Copy link
Member

mvdan commented Jan 15, 2021

@thockin Just from a quick look, it seems like the go/build usage is like:

  1. walk a directory tree
  2. for every directory, find the package found in it
  3. "visit" the package, by loading it and its dependencies, with a memoizing cache

I don't quite understand what the program intends to accomplish, other than doing the loading and printing debug logs. I assume it's just for the sake of illustrating the slowness.

From a tooling point of view, the strategy above is really slow. To support modules, both go/packages and go/build execute the go list -json tool to load packages. So, at least, you are running go list twice per package. Even with the help of the module and build caches, it's not surprising the work takes tens of seconds; you'd be running go list close to a thousand times:

$ # how many packages does ./pkg/proxy/iptables depend on?
$ go list -deps ./pkg/proxy/iptables | wc -l
405

I think it would be more helpful if you explained what exactly your program is trying to do. I'm pretty convinced you could load all the packages you need in one single underlying go list call, similar to how I was able to use go list -deps above to list all the transitive dependencies in about 200ms.

@mvdan
Copy link
Member

mvdan commented Jan 15, 2021

Also, I'm happy to take this somewhere else if you would like to discuss your program in more detail. GitHub is not a very good place to provide help of this kind :)

@thockin
Copy link
Author

thockin commented Jan 16, 2021

I don't quite understand what the program intends to accomplish, other than doing the loading and printing debug logs. I assume it's just for the sake of illustrating the slowness.

Precisely. The real program looks at all loaded packages and emits some Makefile snippets for them, but that did not seem relevant to demonstrating the problem.

From a tooling point of view, the strategy above is really slow. To support modules, both go/packages and go/build execute the go list -json tool to load packages. So, at least, you are running go list twice per package. Even with the help of the module and build caches, it's not surprising the work takes tens of seconds; you'd be running go list close to a thousand times

It was not slow when it was written. It's not slow with GO111MODULE=off. Modules totally borked it.

I think it would be more helpful if you explained what exactly your program is trying to do. I'm pretty convinced you could load all the packages you need in one single underlying go list call,

This is for Kubernetes. I have a really slow codegen tool that I want to run ONLY if the packages it consumes have ACTUALLY changed (mtime is a good enough proxy). To make it harder, I don't know a priori which packages need codegen. I want to scan the file tree, find the files that need codegen, check their deps, and only then run the codegen (actually we have several codegens in the same mold).

A single go list would be ideal except that it barfs on module edges and OF COURSE kubernetes has multiple modules. I modelled the k8s codebase at https://github.com/thockin/go-list-modules. What happens is:

$ DIRS_NEED_CODEGEN=$(\
    find . -type f -name \*.go \
        | xargs grep -l "^// *NEEDS_CODEGEN" \
        | xargs -n 1 dirname \
        | sort \
        | uniq \
        | paste -sd' '
)

$ go list -f '{{.GoFiles}}' $DIRS_NEED_CODEGEN
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other1/unused
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other1/used
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other2/unused
main module (example.com/m) does not contain package example.com/m/staging/src/example.com/other2/used
main module (example.com/m) does not contain package example.com/m/submod/unused
main module (example.com/m) does not contain package example.com/m/submod/used

I started writing a "smarter" tool to only call packages.Load() one time, but that won't load across modules ("main module does not contain", requires CWD to be set to the module dir), so I can't do more than one at a time.

I have a thread going on golang-nuts on this, and @jayconrod opened #43733 as part of that. I admit that I am not 100% sure the fix there would fix me, but it seems like it would. There's terminology and philosophy around modules that I don't quite understand, so it's hard for me to say what is expected and what is a bug. What I know is that I keep running into dead-ends. Things I would expect to work, don't.

@mvdan
Copy link
Member

mvdan commented Jan 16, 2021

Modules totally borked it.

When I say this is slow, I mean in the context of modules. Some tools that got away with being fast in the GOPATH world will need to be re-thought in the world of modules to remain fast, because modules are more powerful but also more complex. I know that can be frustrating, which is why I'm trying to help.

A single go list would be ideal except that it barfs on module edges and OF COURSE kubernetes has multiple modules.

Right, I was not aware the single repository contained many modules. It seems like you could use one go list call (aka one go/packages.Load) per module, using the right directory for each module.

I have a really slow codegen tool that I want to run ONLY if the packages it consumes have ACTUALLY changed (mtime is a good enough proxy)
[...]
I started writing a "smarter" tool to only call packages.Load() one time, but that won't load across modules

If you can link to that work-in-progress software, as well as the existing program that currently works, I can take a look and try to come up with a working version that's fast enough with modules. Your summary above is helpful, but ultimately I can't properly help until I see the code.

@thockin
Copy link
Author

thockin commented Jan 16, 2021

It seems like you could use one go list call (aka one go/packages.Load) per module, using the right directory for each module.

I tried that - it takes forever. There are a lot of cross-references and some very large modules end up being processed repeatedly. What used to take 2-4 seconds now takes 45. :(

If you can link to that work-in-progress software, as well as the existing program that currently works, I can take a look

I appreciate the effort!

Existing: https://github.com/kubernetes/kubernetes/tree/master/hack/make-rules/helpers/go2make

New one is very cobbled together right now as I was trying to find some way to make it produce correct results without resorting to GO111MODULE=off. Will paste a file here. Run as go2make2 -m <modules...> or go2make2 -p <packages...>. Set GO111MODULE=off to see it behave differently.

package main

import (
	"fmt"
	"os"
	"path"
	"path/filepath"
	"sort"
	"strings"
	"time"

	"github.com/davecgh/go-spew/spew"
	"github.com/spf13/pflag"
	"golang.org/x/tools/go/packages"
)

var pkgIdx = []string{} // sorted
var pkgMap = map[string]*packages.Package{}
var pkgDeps = map[string]map[string]bool{} // transitive
var pkgDirs = map[string]string{}

var flOut = pflag.StringP("out", "o", "struct", "What to output (struct | makevars | spew)")
var flMod = pflag.BoolP("mod", "m", false, "process the arguments as modules")
var flPkg = pflag.BoolP("pkg", "p", false, "process the arguments as packages")
var flDbg = pflag.BoolP("debug", "d", false, "enable debugging output")
var flDbgTime = pflag.BoolP("debug-time", "D", false, "enable debugging output with timestamps")

var lastDebugTime time.Time

func debug(items ...interface{}) {
	if *flDbg || *flDbgTime {
		x := []interface{}{}
		if *flDbgTime {
			elapsed := time.Since(lastDebugTime)
			if lastDebugTime.IsZero() {
				elapsed = 0
			}
			lastDebugTime = time.Now()
			x = append(x, fmt.Sprintf("DBG(+%v):", elapsed))
		} else {
			x = append(x, "DBG:")
		}
		x = append(x, items...)
		fmt.Fprintln(os.Stderr, x...)
	}

}

func main() {
	pflag.Parse()

	if !*flMod && !*flPkg {
		fmt.Fprintf(os.Stderr, "one of --mod (-m) or --pkg (-p) must be specified\n")
		os.Exit(1)
	}
	if *flMod && *flPkg {
		fmt.Fprintf(os.Stderr, "only one of --mod (-m) and --pkg (-p) may be specified\n")
		os.Exit(1)
	}

	targets := pflag.Args()
	if len(targets) == 0 {
		targets = append(targets, ".")
	}
	debug("targets:", targets)

	if *flMod {
		for _, mod := range targets {
			visitModule(mod)
		}
	} else {
		visitPackages(targets...)
	}

	switch *flOut {
	case "struct":
		dumpStructured()
	case "makevars":
		dumpMakeVars()
	case "spew":
		dumpSpew()
		//FIXME: default
	}
}

func visitModule(mod string) {
	debug("visiting module", mod)
	cfg := packages.Config{
		Mode:  packages.NeedName | packages.NeedFiles | packages.NeedImports | packages.NeedDeps | packages.NeedModule,
		Dir:   mod,
		Tests: false,
	}
	//log.Printf("mod-path: %v\n", mod)
	pkgs, err := packages.Load(&cfg, "./...")
	//log.Printf("loaded:   %v\n", mod)
	if err != nil {
		fmt.Fprintf(os.Stderr, "error loading packages in module %q: %v\n", mod, err)
		os.Exit(1)
	}

	for _, p := range pkgs {
		visitPackage(p) //FIXME: errors
	}
	//log.Printf("done:     %v\n", mod)
}

func visitPackages(pkgNames ...string) {
	debug("visiting packages", pkgNames)
	//FIXME: need to set GO111MODULE=off to load packages across modules
	cfg := packages.Config{
		Mode:  packages.NeedName | packages.NeedFiles | packages.NeedImports | packages.NeedDeps | packages.NeedModule,
		Tests: false,
	}
	//log.Printf("loading %d:\n", len(pkgNames))
	pkgs, err := packages.Load(&cfg, pkgNames...)
	//log.Printf("loaded  %d:\n", len(pkgNames))
	if err != nil {
		fmt.Fprintf(os.Stderr, "error loading packages: %v\n", err)
		os.Exit(1)
	}

	for _, p := range pkgs {
		visitPackage(p) //FIXME: errors
	}
	//log.Printf("done\n")
}

func visitPackage(pkg *packages.Package) map[string]bool {
	debug("visiting package", pkg.ID)
	if len(pkg.Errors) > 0 {
		debug("  package", pkg.ID, "has errors:", pkg.Errors)
		return nil
	}
	if pkgMap[pkg.ID] == pkg {
		debug("  package", pkg.ID, "already visited")
		return pkgDeps[pkg.ID]
	}
	if pkgMap[pkg.ID] != nil {
		debug("  package", pkg.ID, "already visited, but not the same")
		// HACK do we really need this?
		//log.Printf("revisit %q", pkg.ID)
		oldModDir := ""
		newModDir := ""
		if pkgMap[pkg.ID].Module != nil {
			oldModDir = pkgMap[pkg.ID].Module.Dir
		}
		if pkg.Module != nil {
			newModDir = pkg.Module.Dir
		}
		if len(newModDir) > len(oldModDir) {
			pkgMap[pkg.ID] = pkg
		}
		return pkgDeps[pkg.ID]
	}

	if len(pkg.GoFiles) == 0 {
		debug("  package", pkg.ID, "has no gofiles")
		return nil
	}
	// hack - don't descend into GOROOT, has to be a better way
	if strings.HasPrefix(pkg.GoFiles[0], "/usr") {
		debug("  package", pkg.ID, "is stdlib")
		return nil
	}
	pkgDir, _ := filepath.Abs(path.Dir(pkg.GoFiles[0]))
	if !strings.HasSuffix(pkgDir, "/") {
		pkgDir = pkgDir + "/"
	}
	if prune(pkgDir) {
		debug("  package", pkg.ID, "was pruned")
		return nil
	}
	debug("  package", pkg.ID, "is new")

	//n := time.Now()
	imps := map[string]bool{}
	for _, p := range pkg.Imports {
		pi := visitPackage(p)
		imps[p.ID] = true
		merge(imps, pi)
	}

	pkgIdx = append(pkgIdx, pkg.ID)
	pkgMap[pkg.ID] = pkg
	debug("  package", pkg.ID, "saved")
	pkgDeps[pkg.ID] = imps
	pkgDirs[pkg.ID] = pkgDir
	//fmt.Printf("%q took %v\n", pkg.ID, time.Since(n))
	return imps
}

func merge(lhs, rhs map[string]bool) {
	for k := range rhs {
		lhs[k] = true
	}
}

func flatten(m map[string]bool) []string {
	l := make([]string, 0, len(m))
	for k := range m {
		l = append(l, k)
	}
	sort.Strings(l)
	return l
}

func dumpStructured() {
	for _, k := range pkgIdx {
		pkg := pkgMap[k]
		fmt.Printf("Package %q {\n", pkg.ID)
		fmt.Printf("  Name: %q\n", pkg.Name)
		fmt.Printf("  PkgPath: %q\n", pkg.PkgPath)
		if pkg.Module != nil {
			fmt.Printf("  Module: %q\n", pkg.Module.Path)
			fmt.Printf("  ModuleDir: %q\n", pkg.Module.Dir)
		}
		fmt.Printf("  GoFiles: [\n")
		for _, f := range pkg.GoFiles {
			fmt.Printf("    %q\n", f)
		}
		fmt.Printf("  ]\n")
		fmt.Printf("  Imports: [\n")
		for _, i := range pkg.Imports {
			fmt.Printf("    %q\n", i.ID)
		}
		fmt.Printf("  ]\n")
		fmt.Printf("}\n")
	}
}

func dumpMakeVars() {
	for _, k := range pkgIdx {
		pkg := pkgMap[k]
		deps := flatten(pkgDeps[k])
		files := allFiles(pkg, deps)
		if len(files) == 0 {
			// TODO: we probably should output a dep on the dir itself, but
			// without files I am not sure how to find that.
			continue
		}
		//FIXME: this outputs "k8s.io/api/...apps/v1", but I need
		//"k8s.io/kubernetes/vendor/api"
		fmt.Printf("%s :=", pkg.ID)
		dir := ""
		for _, f := range files {
			d := path.Dir(f)
			if d != dir {
				fmt.Printf(" \\\n  %s/", d)
				dir = d
			}
			fmt.Printf(" \\\n  %s", f)
		}
		fmt.Printf("\n")
	}
}

func allFiles(pkg *packages.Package, deps []string) []string {
	out := make([]string, 0, len(pkg.GoFiles)+len(deps))
	out = append(out, pkg.GoFiles...)
	for _, dep := range deps {
		if pkgMap[dep] != nil {
			out = append(out, pkgMap[dep].GoFiles...)
		}
	}
	return out
}

func dumpSpew() {
	for _, k := range pkgIdx {
		pkg := pkgMap[k]
		cfg := spew.ConfigState{
			DisableMethods: true,
			Indent:         "    ",
			MaxDepth:       5,
		}
		cfg.Dump(pkg)
	}
}

@mvdan
Copy link
Member

mvdan commented Jan 17, 2021

Thanks! While I look at that, I thought I'd point you towards Go's build cache. It already keeps track of "cache IDs" or keys for each package, which change if its source code or any transitive dependencies change. That might be a faster and possibly simpler approach to accomplish this goal, and it also mimics exactly what go build will see. For example:

$ go list -export -f '{{.Export}}' ./pkg/volume/downwardapi
/home/mvdan/.cache/go-build/94/94ec43b481b9e23b8ef50d616abb35eed35f8aaf7d924f1c55db5820c60f336a-d
$ go list -export -f '{{.Export}}' ./pkg/volume/downwardapi
/home/mvdan/.cache/go-build/94/94ec43b481b9e23b8ef50d616abb35eed35f8aaf7d924f1c55db5820c60f336a-d
$ echo "package downwardapi" > ./pkg/volume/downwardapi/rebuild.go
$ go list -export -f '{{.Export}}' ./pkg/volume/downwardapi
/home/mvdan/.cache/go-build/64/6458d1a877ba9f7adbda4f9ba61ae5120c7b3911273d2cace5309eeb06d0bc54-d

The basic idea is: if that path/string changes, it's a different build. The command will build the package(s) in question, but that shoudl be fast if they are already in the build cache. Depends on whether you want to leave the heavy lifting to cmd/go or to make, I guess. I'd lean towards the former, just for consistency.

@mvdan
Copy link
Member

mvdan commented Jan 17, 2021

I just realised you can obtain all the information you need from just the root module, because it includes all the other k8s.io modules in the vendor folder. I think that should result in a working program that takes a few seconds as well. I'll share some code later today.

@mvdan
Copy link
Member

mvdan commented Jan 17, 2021

Alright, here you go: mvdan/kubernetes@736c51e

See the commit message for details. Runs well for me in module mode, and even a bit faster than the old program in GOPATH mode.

If you wonder why use go list -json directly instead of go/packages - because the latter is build-system-agnostic, so it hides internal details that we need like the Goroot field.

@thockin
Copy link
Author

thockin commented Jan 19, 2021

First, thanks for the time!

go list -export looks like it does what I need, with a couple wrinkles.

  1. A change to one of the "hub" packages (imported by many others) can take a long time to recalculate. Playing with it, it took upwards of 15 seconds once, but I can't reproduce it now. I have no viz into why.

  2. Kubernetes gets a bunch of directory <dir> is not a package listed in vendor/modules.txt. This is happening because of the way we handle staging/src/... I can't use the staging dirs themselves because I get main module (k8s.io/kubernetes) does not contain package <dir>. I will have to see if I can fix that, but it's annoying.

I think this would be an ideal state if I can get there - I'd rather NOT reinvent parts of go. That said, it will require some retooling of the rest of the build, which assumes a bunch of directory-specific variables.

If that hits a dead-end, I'll circle back to your rebuild of go2make. I suspect, by reading it, that it does not quite satisfy - I need to also be able to list modules that the k8s "main" module does not depend on, which is where I get the need to list multiple packages and the main module ... does not contain package ... error.

@mvdan
Copy link
Member

mvdan commented Jan 19, 2021

go list -export is slow because it needs to rebuild the packages it gets passed, since its result is pointing you at the compiled packages on disk. The only other similar bit of info you could possibly use is go list -f '{{.ImportPath}} {{.Stale}}', which tells you if a package is up to date in the build cache - but that's not really what you want, either.

Ideally there would be a flag similar to -export to obtain some sort of build cache key, but without actually having to fill the build cache. I don't think that exists today, since a package's cache key contains the resulting build of all dependencies as part of its hash. So, to compute a package's cache key, you must have built all dependencies first. You can read the first 100 or so lines in https://golang.org/src/cmd/go/internal/work/buildid.go, if you're curious :)

Maybe @bcmills @jayconrod @matloob have more ideas in terms of how you could piggyback off of Go's own build cache. Telling if a package (or its dependencies) has changed between two points in time seems generally useful, without having to fill the build cache first. Arguably that would be a fairly different caching system, but cmd/go could reuse a lot of code if it wanted to offer such a feature.

@thockin
Copy link
Author

thockin commented Jan 19, 2021

Yeah, I realized this is what is happening, which would maybe be OK if we could re-use the built results, but we have some build flags we pass, so I am not even sure we can.

The package listed in vendor/modules.txt error is a pain. It's coming because some of the packages that need codegen are testing-only and are only used by tests in other vendor/* packages (no deps from the "main" module"). go mod vendor specifically does not include test code, but kubernetes is too clever for our own good and defeats that.

If not for the main module ... does not contain package error, I could access those thru our staging/ directory, but I can't.

When I tried to run your new tool against my model repo (much easier to validate than all of kubernetes) I get:

2021/01/19 14:13:12 subdir/file.go:4:2: cannot find package "." in:
	/tmp/go-list-modules/vendor/example.com/m/submod/used
exit status 1

I will have to debug later. Out of time for now.

@bcmills
Copy link
Contributor

bcmills commented Jan 20, 2021

Telling if a package (or its dependencies) has changed between two points in time seems generally useful, without having to fill the build cache first.

If you've actually built the package using the go command, you can use the Stale field of the go list output.

Otherwise, probably the best you can do with go list is to use the …Files fields and manually check them for diffs.

@liggitt
Copy link
Contributor

liggitt commented Feb 19, 2021

Kubernetes has several tools (the go2make one discussed here, as well as code generation tools that make extensive use of go/build) that either run significantly slower (> order of magnitude) or get different results w.r.t. package import paths when running in module mode.

If go is planning to drop support for GOPATH in 1.17, have those behavior and performance changes been communicated?

  • Is there a clear "if you're using this method in go/build, this is what you should do with golang.org/x/tools/go/packages instead" guide?
  • What are the compatibility guarantees around the go/build package behavior and performance?

@jayconrod
Copy link
Contributor

Just to explain a bit of what's going on under the hood:

  • In both GOPATH mode and module mode, go/build.ImportDir loads information from a given directory without loading dependencies. It doesn't do anything different in module mode. Using this should stay fast.
  • In GOPATH mode, go/build.Import resolves an import path to a directory by searching GOPATH and vendor. That's usually fast. Again, it doesn't load dependencies.
  • In module mode, go/build.Import uses go list to resolve an import path to a directory.
  • go list always loads dependencies. It probably doesn't need to do that in GOPATH mode, but in module mode, if any transitively imported packages are missing, we might need to change the build list, which could change selected versions of other dependencies.
  • Because of that, if you're calling go/build.Import for n packages, switch from GO111MODULE=off to GO111MODULE=on is like switching from O(n) to O(n2) (using that notation very loosely) because the same shared set of dependencies is loaded many times.
  • This is a big part of why golang.org/x/tools/go/packages was written. Most tools before modules were based on go/build, but it was clear that wasn't going to be powerful enough in the long term. We recommend that golang.org/x/tools/go/packages is used in most cases.
  • Like build.Import, packages.Load is based on go list, but unlike build.Import, packages.Load can load a batch of packages at once. So loading n packages should take O(n) time if it's done with one packages.Load call (again, very loosely).

@liggitt To answer your questions directly:

If go is planning to drop support for GOPATH in 1.17, have those behavior and performance changes been communicated?

Yes, but probably not well enough. When we published packages a couple years ago, we worked directly with tool owners and folks in the tools working group. gopls has also replaced a lot of GOPATH-only tools.

If it would be helpful to add some advice to the go/build and golang.org/x/tools/go/packages documentation, I'd be happy to open an issue and do that for 1.17. Let me know if anything else would be useful in addition.

Is there a clear "if you're using this method in go/build, this is what you should do with golang.org/x/tools/go/packages instead" guide?

I'm not sure we have anything that specific. @matloob do you know if there's anything like that?

What are the compatibility guarantees around the go/build package behavior and performance?

go/build is covered by the Go 1 compatibility guidelines. It should continue to work in module-aware mode, but we don't have any specific performance guarantees.

@bcmills
Copy link
Contributor

bcmills commented Feb 24, 2021

  • go list always loads dependencies. It probably doesn't need to do that in GOPATH mode, but in module mode, if any transitively imported packages are missing, we might need to change the build list, which could change selected versions of other dependencies.

FWIW, I don't think that go list should implicitly load dependencies in module mode. (That's arguably part of #29666.) But that's going to be a lot of work to fix, I think, including cleaning up a lot of tests that assume it today.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance Tools This label describes issues relating to any tools in the x/tools repository. ToolSpeed
Projects
None yet
Development

No branches or pull requests

9 participants