Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: allow -toolexec tools to opt in to build caching #41145

Closed
mvdan opened this issue Aug 31, 2020 · 14 comments
Closed

proposal: cmd/go: allow -toolexec tools to opt in to build caching #41145

mvdan opened this issue Aug 31, 2020 · 14 comments
Labels
FrozenDueToAge GoCommand cmd/go Proposal Tools This label describes issues relating to any tools in the x/tools repository. ToolSpeed
Milestone

Comments

@mvdan
Copy link
Member

mvdan commented Aug 31, 2020

From go help build:

        -toolexec 'cmd args'
                a program to use to invoke toolchain programs like vet and asm.
                For example, instead of running asm, the go command will run
                'cmd args /path/to/asm <arguments for asm>'.

-toolexec is a very powerful flag to control or modify how Go's toolchain programs get run. The classic use case is for development of Go itself; for example, using a program to measure how much CPU and wall time each compiler and linker invocation takes, or to insert a debugger in a particular toolchain program execution.

Issue #27628 already covers an existing problem: right now, the use of -toolexec does not invalidate the build cache, which is a problem in both of the examples above. We agree that that bug should be fixed, but consider it to be a separate issue.

However, we argue that it would be reasonable for a toolexec program's output to be cached when the program is deterministic. The use case we have in mind is tools which alter a Go build predictably; for example, a code obfuscator which modifies the arguments passed to the compiler and linker in a deterministic and reproducible way. Without caching, such a tool is extremely slow, especially for big projects with many packages.

The way go build handles caching properly for go tool compile seems to be to first ask it compile -V=full, which shows the compiler's full version, and then use that as part of the cache key. This means that the "package build" operations will need to be run again if the compiler version changes or if the input changes, but not otherwise.

We propose adding similar functionality when using -toolexec. For example, when running go build -toolexec=mytool, the Go tool would run the tool with a well defined version flag like mytool -toolversion, similar to the compiler's -V=full. If it returns a non-zero exit code, we do no caching at all, which would align with the non-deterministic use cases in #27628. If the command succeeds, its output is added to the cache key, and proper build caching takes place.

We think this is the best of both worlds, because current non-deterministic toolexec programs would continue to use no caching at all, while other Go-specific complex tools could take advantage of the build cache.

We should also define the bare minimum of information for a -toolversion flag to report. go tool compile -V=full seems to include the toolchain's version as well as the compiler's package build ID:

$ go tool compile -V=full
compile version devel +54e18f1c2a Fri Aug 28 20:01:41 2020 +0000 buildID=DU3bt_lJqYe1kVhgbnOf/1cyl58sUXfIZ9K-RctHM/5cl10_pQMFsyS4FJFGwM/fqivXvY1n7nVWhRpqqqP

We realise that this proposal is somewhat far-fetched, and the easy answer might be "-toolexec was never meant for this kind of use". However, we argue that it opens up a very exciting world of possibilities in Go tooling, especially with tools that closely integrate with the toolchain itself to alter builds.

#29430 is a related proposal, but it also has a very different objective. It proposes an entirely new interface for code rewriting tools, which is yet to be defined. We simply propose to improve the existing -toolexec interface in a very specific and backwards-compatible way. We also think both features could coexist; not all build-altering tools will simply need to alter source code - for example, the code obfuscator shown above also enforces some linker flags like -w -s.

/cc @bcmills @cherrymui @ianlancetaylor @josharian @jayconrod @ianthehat

@mvdan mvdan added Proposal ToolSpeed GoCommand cmd/go Tools This label describes issues relating to any tools in the x/tools repository. labels Aug 31, 2020
@mvdan mvdan added this to the Proposal milestone Aug 31, 2020
@bcmills
Copy link
Contributor

bcmills commented Aug 31, 2020

The problem with general caching for toolexec is that we can't tell whether the tool is idempotent (such as a transformer or annotator or some sort of idempotent analyzer), or intended specifically to measure repeated invocations (such as a profiler or tracer), or even consistent from run to run (such as an interactive debugger with user-supplied edits to arbitrary memory locations).

Tool versioning aside, I think we would also need some way to determine whether the output is actually intended to be cached.

@mvdan
Copy link
Member Author

mvdan commented Aug 31, 2020

we would also need some way to determine whether the output is actually intended to be cached.

That's what the existence of the flag is meant to convey. We could call it -gocachekey if you think -toolversion is not specific enough.

@ianlancetaylor ianlancetaylor changed the title Proposal: cmd/go: allow -toolexec tools to opt in to build caching proposal: cmd/go: allow -toolexec tools to opt in to build caching Aug 31, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Aug 31, 2020
@rsc
Copy link
Contributor

rsc commented Sep 2, 2020

Consider go build -toolexec myexec.
It sounds like you are saying to run myexec -V=full.
Why are we not running myexec compile -V=full?
Why does myexec matter at all?
What matters is what happens in the tools it runs.
Using "myexec compiler -V=full" would work for toolstash.

I see #27628 and wanting to get timing repeated, but that seems orthogonal to the use of toolexec.
You can't run "time go build" twice in a row and get consistent answers either (the second uses results cached by the first).
"go build -toolexec=time" behaving like a fine-grained "time go build" does not bother me too much.
Maybe for that case you want to be able to checkpoint the cache and rewind.
(Delete all cached results past a certain moment, or maybe "go build -do-not-write-new-things-to-cache".)

@mvdan
Copy link
Member Author

mvdan commented Sep 5, 2020

It sounds like you are saying to run myexec -V=full.
Why are we not running myexec compile -V=full?
Why does myexec matter at all?

Both myexec -gocachekey and go tool compile -V=full would be in the cache key when running go build -toolexec myexec, now that I think about it. This is because myexec and go tool compile are independent and can change their behavior separately.

Running only myexec compile -V=full and expecting that to contain both cache key inputs at once is an option, too. I don't feel strongly one way or another.

@rsc
Copy link
Contributor

rsc commented Sep 16, 2020

What kinds of things do you envision myexec doing that would make only using compile -V=full not good enough as a cache key? Myexec's job should be to run some kind of executable. Would it ever change the behavior of that executable so that it didn't work the way it was compiled to work? (How it was compiled to work should be covered by the -V=full output.)

@mvdan
Copy link
Member Author

mvdan commented Sep 16, 2020

The immediate use case is indeed altering Go builds, such as changing the input Go source or altering the flags passed to the compiler and linker. This is how I implemented a binary obfuscator, which I tried to explain in the original post.

Any tool which aims to build a Go binary with altered Go source code would also benefit from this. #29430 shows fuzzing as an example, since one needs to instrument the code for fuzzing.

@rsc
Copy link
Contributor

rsc commented Sep 18, 2020

@mdvan In that case it seems pretty clear that the tool that is altering the behavior of the compiler should be responsible for altering the -V=full output as well.

@mvdan
Copy link
Member Author

mvdan commented Sep 18, 2020

That's actually a good point, I'm not sure why I never thought of it. I'll give that a try; please give me a week before making a decision on this proposal.

@rsc
Copy link
Contributor

rsc commented Sep 18, 2020

For the record (because I wasn't sure above), go build -toolexec=myexec already runs myexec compile -V=full:

% go build -toolexec=echo helloworld.go
go tool compile -V=full: unexpected output:
	/Users/rsc/go/pkg/tool/darwin_amd64/compile -V=full
go tool compile -V=full: unexpected output:
	/Users/rsc/go/pkg/tool/darwin_amd64/compile -V=full
go tool compile -V=full: unexpected output:
	/Users/rsc/go/pkg/tool/darwin_amd64/compile -V=full

@mvdan
Copy link
Member Author

mvdan commented Sep 18, 2020

Indeed, it works. Here's a proof of concept I wrote today: burrowers/garble@308e4ae

Unfortunately, via plain go build I don't have a good way to obtain myexec's own version. For now, I just do a sha256 sum of os.Executable, and add the flags passed to myexec which affect the output. Hopefully #37475 can be fixed soon, as then I'd avoid the extra I/O.

I'll leave this proposal open for a few more days, just in case this current solution isn't enough for any other tools or people following this thread. For my own immediate use, it appears I don't need any extra features after all, so I'm happy.

@mvdan
Copy link
Member Author

mvdan commented Sep 22, 2020

Alright, withdrawing proposal as per my comment above. Thanks again @rsc.

@mvdan mvdan closed this as completed Sep 22, 2020
@mvdan mvdan moved this from Active to Declined in Proposals (old) Sep 22, 2020
mvdan added a commit to burrowers/garble that referenced this issue Sep 22, 2020
As per the discussion in golang/go#41145, it
turns out that we don't need special support for build caching in
-toolexec. We can simply modify the behavior of "[...]/compile -V=full"
and "[...]/link -V=full" so that they include garble's own version and
options in the build cache key.

We add a number of things to the -V=full output. First, "+garble" since
that is the relatively unique name of the Go program. Second, the
version of Garble itself. Since we can't do this via modules until
golang/go#37475, we instead use the
hex-encoded sha256 of our own binary.

Finally, we need to add the garble options which modify how we obfuscate
code, since each should result in different build cache keys. GOPRIVATE
also affects caching, since a different GOPRIVATE value means that we
might have to garble a different set of packages.

This feature works, with only a minor regression in the ldflags test
since the -X linker flag is now broken with private names. The following
commit will fix that.
@rsc
Copy link
Contributor

rsc commented Sep 23, 2020

Unfortunately, via plain go build I don't have a good way to obtain myexec's own version.

I don't understand this. Every program gets a build ID, right? It seems like you should be able to fetch the build ID from os.Executable instead of doing the full sha256 of the executable.

@mvdan
Copy link
Member Author

mvdan commented Sep 23, 2020

Oh, I was completely unaware that I could use go tool buildid on a final Go binary. I thought it would only work on object files. That indeed seems to be significantly faster on a 7MiB binary.

@mvdan
Copy link
Member Author

mvdan commented Sep 23, 2020

Indeed, it works. Here's a proof of concept I wrote today: burrowers/garble@308e4ae

I should note that it does not work just yet, because my implementation was far too naive. I need to first properly read the documentation on what build IDs are (from src/cmd/go/internal/work/buildid.go), and learn how to alter the hashes in the right way without altering the format.

So, it should still work, I just went about it the wrong way in my first attempt.

mvdan added a commit to burrowers/garble that referenced this issue Sep 26, 2020
As per the discussion in golang/go#41145, it
turns out that we don't need special support for build caching in
-toolexec. We can simply modify the behavior of "[...]/compile -V=full"
and "[...]/link -V=full" so that they include garble's own version and
options in the printed build ID.

The part of the build ID that matters is the last, since it's the
"content ID" which is used to work out whether there is a need to redo
the action (build) or not. Since cmd/go parses the last word in the
output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}".
The slashes let us imitate a full binary build ID, but we assume that
the other components such as the action ID are not necessary, since the
only reader here is cmd/go and it only consumes the content ID.

The reported content ID includes the tool's original content ID,
garble's own content ID from the built binary, and the garble options
which modify how we obfuscate code. If any of the three changes, we
should use a different build cache key. GOPRIVATE also affects caching,
since a different GOPRIVATE value means that we might have to garble a
different set of packages.
mvdan added a commit to burrowers/garble that referenced this issue Oct 7, 2020
As per the discussion in golang/go#41145, it
turns out that we don't need special support for build caching in
-toolexec. We can simply modify the behavior of "[...]/compile -V=full"
and "[...]/link -V=full" so that they include garble's own version and
options in the printed build ID.

The part of the build ID that matters is the last, since it's the
"content ID" which is used to work out whether there is a need to redo
the action (build) or not. Since cmd/go parses the last word in the
output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}".
The slashes let us imitate a full binary build ID, but we assume that
the other components such as the action ID are not necessary, since the
only reader here is cmd/go and it only consumes the content ID.

The reported content ID includes the tool's original content ID,
garble's own content ID from the built binary, and the garble options
which modify how we obfuscate code. If any of the three changes, we
should use a different build cache key. GOPRIVATE also affects caching,
since a different GOPRIVATE value means that we might have to garble a
different set of packages.

Include tests, which mainly check that 'garble build -v' prints package
lines when we expect to always need to rebuild packages, and that it
prints nothing when we should be reusing the build cache even when the
built binary is missing.

After this change, 'go test' on Go 1.15.2 stabilizes at about 8s on my
machine, whereas it used to be at around 25s before.
mvdan added a commit to burrowers/garble that referenced this issue Oct 8, 2020
As per the discussion in golang/go#41145, it
turns out that we don't need special support for build caching in
-toolexec. We can simply modify the behavior of "[...]/compile -V=full"
and "[...]/link -V=full" so that they include garble's own version and
options in the printed build ID.

The part of the build ID that matters is the last, since it's the
"content ID" which is used to work out whether there is a need to redo
the action (build) or not. Since cmd/go parses the last word in the
output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}".
The slashes let us imitate a full binary build ID, but we assume that
the other components such as the action ID are not necessary, since the
only reader here is cmd/go and it only consumes the content ID.

The reported content ID includes the tool's original content ID,
garble's own content ID from the built binary, and the garble options
which modify how we obfuscate code. If any of the three changes, we
should use a different build cache key. GOPRIVATE also affects caching,
since a different GOPRIVATE value means that we might have to garble a
different set of packages.

Include tests, which mainly check that 'garble build -v' prints package
lines when we expect to always need to rebuild packages, and that it
prints nothing when we should be reusing the build cache even when the
built binary is missing.

After this change, 'go test' on Go 1.15.2 stabilizes at about 8s on my
machine, whereas it used to be at around 25s before.
mvdan added a commit to burrowers/garble that referenced this issue Oct 11, 2020
As per the discussion in golang/go#41145, it
turns out that we don't need special support for build caching in
-toolexec. We can simply modify the behavior of "[...]/compile -V=full"
and "[...]/link -V=full" so that they include garble's own version and
options in the printed build ID.

The part of the build ID that matters is the last, since it's the
"content ID" which is used to work out whether there is a need to redo
the action (build) or not. Since cmd/go parses the last word in the
output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}".
The slashes let us imitate a full binary build ID, but we assume that
the other components such as the action ID are not necessary, since the
only reader here is cmd/go and it only consumes the content ID.

The reported content ID includes the tool's original content ID,
garble's own content ID from the built binary, and the garble options
which modify how we obfuscate code. If any of the three changes, we
should use a different build cache key. GOPRIVATE also affects caching,
since a different GOPRIVATE value means that we might have to garble a
different set of packages.

Include tests, which mainly check that 'garble build -v' prints package
lines when we expect to always need to rebuild packages, and that it
prints nothing when we should be reusing the build cache even when the
built binary is missing.

After this change, 'go test' on Go 1.15.2 stabilizes at about 8s on my
machine, whereas it used to be at around 25s before.
capnspacehook pushed a commit to burrowers/garble that referenced this issue Oct 11, 2020
As per the discussion in golang/go#41145, it
turns out that we don't need special support for build caching in
-toolexec. We can simply modify the behavior of "[...]/compile -V=full"
and "[...]/link -V=full" so that they include garble's own version and
options in the printed build ID.

The part of the build ID that matters is the last, since it's the
"content ID" which is used to work out whether there is a need to redo
the action (build) or not. Since cmd/go parses the last word in the
output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}".
The slashes let us imitate a full binary build ID, but we assume that
the other components such as the action ID are not necessary, since the
only reader here is cmd/go and it only consumes the content ID.

The reported content ID includes the tool's original content ID,
garble's own content ID from the built binary, and the garble options
which modify how we obfuscate code. If any of the three changes, we
should use a different build cache key. GOPRIVATE also affects caching,
since a different GOPRIVATE value means that we might have to garble a
different set of packages.

Include tests, which mainly check that 'garble build -v' prints package
lines when we expect to always need to rebuild packages, and that it
prints nothing when we should be reusing the build cache even when the
built binary is missing.

After this change, 'go test' on Go 1.15.2 stabilizes at about 8s on my
machine, whereas it used to be at around 25s before.
@golang golang locked and limited conversation to collaborators Sep 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge GoCommand cmd/go Proposal Tools This label describes issues relating to any tools in the x/tools repository. ToolSpeed
Projects
No open projects
Development

No branches or pull requests

4 participants