Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/packages: Load in NeedName|NeedFiles mode can be sped up significantly because providing answer doesn't require internet use #31893

Closed
dmitshur opened this issue May 7, 2019 · 1 comment
Labels
FrozenDueToAge modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dmitshur
Copy link
Contributor

dmitshur commented May 7, 2019

This issue is related to #29452 and #31087, but is more narrowly focused on the performance of the packages.NeedName | packages.NeedFiles mode.

Background

The Go specification defines Go source file organization that is very conducive of extracting partial information with high performance and efficiency.

Specifically, each .go file always has 3 categories of content in well-specified order. The package clause is always first, followed by a possibly empty set of import declarations, and then finally the rest of its content.

This means that to find the package name in a .go file, a Go parser can stop after it parses the package clause. To find all imported packages in a .go file, the parser can stop after it reaches the first non-import declaration.

The Go parser provided by the go/parser package provides fine control over when to stop parsing a .go file via a Mode type:

const (
	PackageClauseOnly Mode = 1 << iota // stop parsing after package clause
	ImportsOnly                        // stop parsing after import declarations
	...
)

The Go specification also describes the concept of a Go package as follows:

Go programs are constructed by linking together packages. A package in turn is constructed from one or more source files that together declare constants, types, variables and functions belonging to the package and which are accessible in all files of the same package.

The go command defines a 4th interesting category of information, build tags or constraints. They are also highly constrained, in that they can appear at the top of a .go source file only. More precisely, from go/build documentation:

Constraints may appear in any kind of source file (not just Go), but they must appear near the top of the file, preceded only by blank lines and other line comments. These rules mean that in Go files a build constraint must appear before the package clause.

The go/build package makes good use of these properties when loading a Go package. The resolved package provides information about the package name, Go and other source files, and imports. To acquire that information, it invokes the Go parser in the following mode:

parser.ParseFile([...], parser.ImportsOnly|parser.ParseComments)

It stops parsing after import declarations, and it needs to parse comments in order to find the build constraint comments. At no point is it necessary to use the internet, since all the necessary information is already available in the provided .go files.

API of go/build

The go/build API does not offer a lot of control over the process of loading a package. It has a ImportMode type which can either be 0 to load Go packages normally (almost all information), or FindOnly:

const (
	// If FindOnly is set, Import stops after locating the directory
	// that should contain the sources for a package. It does not
	// read any files in the directory.
	FindOnly ImportMode = 1 << iota

	...
)

API of go/packages

In contrast to that, the go/packages API provides a much greater degree of freedom about requesting information about the loaded package. See the LoadMode type:

const (
	// NeedName adds Name and PkgPath.
	NeedName LoadMode = 1 << iota

	// NeedFiles adds GoFiles and OtherFiles.
	NeedFiles

	// NeedCompiledGoFiles adds CompiledGoFiles.
	NeedCompiledGoFiles

	// NeedImports adds Imports. If NeedDeps is not set, the Imports field will contain
	// "placeholder" Packages with only the ID set.
	NeedImports

	// NeedDeps adds the fields requested by the LoadMode in the packages in Imports. If NeedImports
	// is not set NeedDeps has no effect.
	NeedDeps

	...
)

As a result, it should be possible to achieve equal or better performance with go/packages API when the user requests as little information as just NeedName or NeedFiles (or both) without NeedImports, since it wouldn't be necessary to parse the .go files beyond the package clause.

Environment

$ go version
go version go1.12.5 darwin/amd64

All timing information was collected on a 2017 MacBook Pro with 256 GB flash storage.

Issue

The performance when using the go/packages package in packages.NeedName | packages.NeedFiles mode is significantly worse compared to the go/build package.

To reproduce the issue, create a target package to be imported in a temporary directory /tmp/target with the following files:

-- go.mod --
module m

go 1.12

require (
	github.com/google/go-github v17.0.0+incompatible
	github.com/google/go-querystring v1.0.0 // indirect
)

-- go.sum --
github.com/google/go-github v17.0.0+incompatible h1:N0LgJ1j65A7kfXrZnUDaYCs/Sf4rEjNlfyDHW9dolSY=
github.com/google/go-github v17.0.0+incompatible/go.mod h1:zLgOLi98H3fifZn+44m+umXrS52loVEgC2AApnigrVQ=
github.com/google/go-querystring v1.0.0 h1:Xkwi/a1rcvNg1PPYe5vI8GbeBY/jrVuDX5ASuANWTrk=
github.com/google/go-querystring v1.0.0/go.mod h1:odCYkC5MyYFN7vkCjXpyrEuKhc/BUO6wN/zVPAxq5ck=

-- main.go --
// +build ignore

package main

func main() {}

-- p.go --
package p

import "github.com/google/go-github/github"

var _ *github.Client

Then, inside another directory, create the following test command:

-- go.mod --
module m

go 1.12

require golang.org/x/tools v0.0.0-20190506145303-2d16b83fe98c

-- main.go --
package main

import (
	"fmt"
	"go/build"
	"time"

	"golang.org/x/tools/go/packages"
)

func main() {
	for driver, f := range map[string]func(dir string) (name string, goFiles []string, _ error){
		"go/build":    useGoBuild,
		"go/packages": useGoPackages,
	} {
		t0 := time.Now()
		name, goFiles, error := f("/tmp/target")
		t1 := time.Now()

		fmt.Println(driver)
		fmt.Println("  time:", t1.Sub(t0))
		fmt.Println("  error:", error)
		if error == nil {
			fmt.Println("  name:", name)
			fmt.Println("  goFiles:", goFiles)
		}
		fmt.Println()
	}
}

func useGoBuild(dir string) (name string, goFiles []string, _ error) {
	p, err := build.ImportDir(dir, 0)
	if err != nil {
		return "", nil, fmt.Errorf("build.ImportDir: %v", err)
	}
	return p.Name, p.GoFiles, nil
}

func useGoPackages(dir string) (name string, goFiles []string, _ error) {
	cfg := &packages.Config{
		Mode: packages.NeedName | packages.NeedFiles,
		Dir:  dir,
	}
	pkgs, err := packages.Load(cfg, ".")
	if err != nil {
		return "", nil, fmt.Errorf("packages.Load: %v", err)
	}
	if len(pkgs) != 1 {
		return "", nil, fmt.Errorf("got %d packages, want exactly 1 package", len(pkgs))
	}
	p := pkgs[0]
	if len(p.Errors) > 0 {
		return "", nil, fmt.Errorf("p.Errors: %v", p.Errors)
	}
	return p.Name, p.GoFiles, nil
}

Run it as follows:

$ export GO111MODULE=on
$ export GOPROXY=direct
$ export GOPATH=$(mktemp -d)
$ go build -o /tmp/loadpkg
$ /tmp/loadpkg
go/build
  time: 1.743083ms
  error: <nil>
  name: p
  goFiles: [p.go]

go/packages
  time: 1.822050861s
  error: <nil>
  name: p
  goFiles: [/tmp/target/p.go]

$ /tmp/loadpkg
go/build
  time: 1.441372ms
  error: <nil>
  name: p
  goFiles: [p.go]

go/packages
  time: 322.397ms
  error: <nil>
  name: p
  goFiles: [/tmp/target/p.go]

Notice that during the first invocation, go/build took 1.7 milliseconds to produce correct results, while go/packages took 1.8 seconds.

On the second invocation, go/build took 1.5 milliseconds (perhaps due to warmer disk cache), and go/packages took 322 milliseconds. Successive runs produce similar results.

go/packages became much faster on second run compared to first because the github.com/google/go-github and github.com/google/go-querystring modules were downloaded and extracted into the local module cache from the internet the first time, which doesn't need to happen the second time.

However, 300~ milliseconds is still significantly slower than 2~ milliseconds that go/build takes to produce the same correct results for this query. Additionally, all the necessary information to answer a query in NeedName|NeedFiles mode is available in the .go files on disk, so nothing needs to be downloaded from internet to return the correct result.

/cc @matloob per owners.

@dmitshur dmitshur added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. modules labels May 7, 2019
@dmitshur dmitshur added this to the Unreleased milestone May 7, 2019
@dmitshur dmitshur changed the title x/tools/go/packages: Load in NeedName|NeedFiles mode up to 1000+ times slower than go/build x/tools/go/packages: Load in NeedName|NeedFiles mode can be sped up significantly because providing answer doesn't require internet use May 27, 2019
@matloob
Copy link
Contributor

matloob commented Jun 19, 2019

Unfortunately there's not much we can do from the go/packages side. We don't plan to re-implement parts of go list's logic here.

If go list provides some way for us to request more fine grained info, maybe it can do less work, and so can the go/packages call, but for now, a significant improvement is not feasible.

@matloob matloob closed this as completed Jun 19, 2019
gopherbot pushed a commit to golang/pkgsite that referenced this issue Jun 15, 2020
Fetch tests are currently quite slow due to packages.Load
(golang/go#31893), and are starting to flake.
For now, just increase the timeout.

Fixes b/132212122

Change-Id: I3c60b3128e8ce3510ee0f35aef27451e30de54e5
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/462080
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
@golang golang locked and limited conversation to collaborators Jun 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

3 participants