-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: cmd/go: allow -toolexec tools to opt in to build caching #41145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The problem with general caching for Tool versioning aside, I think we would also need some way to determine whether the output is actually intended to be cached. |
That's what the existence of the flag is meant to convey. We could call it |
Consider go build -toolexec myexec. I see #27628 and wanting to get timing repeated, but that seems orthogonal to the use of toolexec. |
Both Running only |
What kinds of things do you envision myexec doing that would make only using compile -V=full not good enough as a cache key? Myexec's job should be to run some kind of executable. Would it ever change the behavior of that executable so that it didn't work the way it was compiled to work? (How it was compiled to work should be covered by the -V=full output.) |
The immediate use case is indeed altering Go builds, such as changing the input Go source or altering the flags passed to the compiler and linker. This is how I implemented a binary obfuscator, which I tried to explain in the original post. Any tool which aims to build a Go binary with altered Go source code would also benefit from this. #29430 shows fuzzing as an example, since one needs to instrument the code for fuzzing. |
@mdvan In that case it seems pretty clear that the tool that is altering the behavior of the compiler should be responsible for altering the -V=full output as well. |
That's actually a good point, I'm not sure why I never thought of it. I'll give that a try; please give me a week before making a decision on this proposal. |
For the record (because I wasn't sure above),
|
Indeed, it works. Here's a proof of concept I wrote today: burrowers/garble@308e4ae Unfortunately, via plain I'll leave this proposal open for a few more days, just in case this current solution isn't enough for any other tools or people following this thread. For my own immediate use, it appears I don't need any extra features after all, so I'm happy. |
Alright, withdrawing proposal as per my comment above. Thanks again @rsc. |
As per the discussion in golang/go#41145, it turns out that we don't need special support for build caching in -toolexec. We can simply modify the behavior of "[...]/compile -V=full" and "[...]/link -V=full" so that they include garble's own version and options in the build cache key. We add a number of things to the -V=full output. First, "+garble" since that is the relatively unique name of the Go program. Second, the version of Garble itself. Since we can't do this via modules until golang/go#37475, we instead use the hex-encoded sha256 of our own binary. Finally, we need to add the garble options which modify how we obfuscate code, since each should result in different build cache keys. GOPRIVATE also affects caching, since a different GOPRIVATE value means that we might have to garble a different set of packages. This feature works, with only a minor regression in the ldflags test since the -X linker flag is now broken with private names. The following commit will fix that.
I don't understand this. Every program gets a build ID, right? It seems like you should be able to fetch the build ID from os.Executable instead of doing the full sha256 of the executable. |
Oh, I was completely unaware that I could use |
I should note that it does not work just yet, because my implementation was far too naive. I need to first properly read the documentation on what build IDs are (from src/cmd/go/internal/work/buildid.go), and learn how to alter the hashes in the right way without altering the format. So, it should still work, I just went about it the wrong way in my first attempt. |
As per the discussion in golang/go#41145, it turns out that we don't need special support for build caching in -toolexec. We can simply modify the behavior of "[...]/compile -V=full" and "[...]/link -V=full" so that they include garble's own version and options in the printed build ID. The part of the build ID that matters is the last, since it's the "content ID" which is used to work out whether there is a need to redo the action (build) or not. Since cmd/go parses the last word in the output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}". The slashes let us imitate a full binary build ID, but we assume that the other components such as the action ID are not necessary, since the only reader here is cmd/go and it only consumes the content ID. The reported content ID includes the tool's original content ID, garble's own content ID from the built binary, and the garble options which modify how we obfuscate code. If any of the three changes, we should use a different build cache key. GOPRIVATE also affects caching, since a different GOPRIVATE value means that we might have to garble a different set of packages.
As per the discussion in golang/go#41145, it turns out that we don't need special support for build caching in -toolexec. We can simply modify the behavior of "[...]/compile -V=full" and "[...]/link -V=full" so that they include garble's own version and options in the printed build ID. The part of the build ID that matters is the last, since it's the "content ID" which is used to work out whether there is a need to redo the action (build) or not. Since cmd/go parses the last word in the output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}". The slashes let us imitate a full binary build ID, but we assume that the other components such as the action ID are not necessary, since the only reader here is cmd/go and it only consumes the content ID. The reported content ID includes the tool's original content ID, garble's own content ID from the built binary, and the garble options which modify how we obfuscate code. If any of the three changes, we should use a different build cache key. GOPRIVATE also affects caching, since a different GOPRIVATE value means that we might have to garble a different set of packages. Include tests, which mainly check that 'garble build -v' prints package lines when we expect to always need to rebuild packages, and that it prints nothing when we should be reusing the build cache even when the built binary is missing. After this change, 'go test' on Go 1.15.2 stabilizes at about 8s on my machine, whereas it used to be at around 25s before.
As per the discussion in golang/go#41145, it turns out that we don't need special support for build caching in -toolexec. We can simply modify the behavior of "[...]/compile -V=full" and "[...]/link -V=full" so that they include garble's own version and options in the printed build ID. The part of the build ID that matters is the last, since it's the "content ID" which is used to work out whether there is a need to redo the action (build) or not. Since cmd/go parses the last word in the output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}". The slashes let us imitate a full binary build ID, but we assume that the other components such as the action ID are not necessary, since the only reader here is cmd/go and it only consumes the content ID. The reported content ID includes the tool's original content ID, garble's own content ID from the built binary, and the garble options which modify how we obfuscate code. If any of the three changes, we should use a different build cache key. GOPRIVATE also affects caching, since a different GOPRIVATE value means that we might have to garble a different set of packages. Include tests, which mainly check that 'garble build -v' prints package lines when we expect to always need to rebuild packages, and that it prints nothing when we should be reusing the build cache even when the built binary is missing. After this change, 'go test' on Go 1.15.2 stabilizes at about 8s on my machine, whereas it used to be at around 25s before.
As per the discussion in golang/go#41145, it turns out that we don't need special support for build caching in -toolexec. We can simply modify the behavior of "[...]/compile -V=full" and "[...]/link -V=full" so that they include garble's own version and options in the printed build ID. The part of the build ID that matters is the last, since it's the "content ID" which is used to work out whether there is a need to redo the action (build) or not. Since cmd/go parses the last word in the output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}". The slashes let us imitate a full binary build ID, but we assume that the other components such as the action ID are not necessary, since the only reader here is cmd/go and it only consumes the content ID. The reported content ID includes the tool's original content ID, garble's own content ID from the built binary, and the garble options which modify how we obfuscate code. If any of the three changes, we should use a different build cache key. GOPRIVATE also affects caching, since a different GOPRIVATE value means that we might have to garble a different set of packages. Include tests, which mainly check that 'garble build -v' prints package lines when we expect to always need to rebuild packages, and that it prints nothing when we should be reusing the build cache even when the built binary is missing. After this change, 'go test' on Go 1.15.2 stabilizes at about 8s on my machine, whereas it used to be at around 25s before.
As per the discussion in golang/go#41145, it turns out that we don't need special support for build caching in -toolexec. We can simply modify the behavior of "[...]/compile -V=full" and "[...]/link -V=full" so that they include garble's own version and options in the printed build ID. The part of the build ID that matters is the last, since it's the "content ID" which is used to work out whether there is a need to redo the action (build) or not. Since cmd/go parses the last word in the output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}". The slashes let us imitate a full binary build ID, but we assume that the other components such as the action ID are not necessary, since the only reader here is cmd/go and it only consumes the content ID. The reported content ID includes the tool's original content ID, garble's own content ID from the built binary, and the garble options which modify how we obfuscate code. If any of the three changes, we should use a different build cache key. GOPRIVATE also affects caching, since a different GOPRIVATE value means that we might have to garble a different set of packages. Include tests, which mainly check that 'garble build -v' prints package lines when we expect to always need to rebuild packages, and that it prints nothing when we should be reusing the build cache even when the built binary is missing. After this change, 'go test' on Go 1.15.2 stabilizes at about 8s on my machine, whereas it used to be at around 25s before.
For reference, the comments in cmd/go about calling |
I realise that what my tool does to overload Go caching is in undocumented territory, and that's fine by me - it's broken slightly a couple of times and I've kept up with upstream changes over time. If Go starts providing a documented and official way for me to accomplish the same, I'll switch to it. If not, I'll continue with the current approach. After all, much of (I'm going to unlock this discussion briefly just to see if anyone else wants to chip in as well) |
Hi @mvdan, Just to confirm -- based on burrowers/garble#142 and it sounds like some subsequent tweaks, you've been able to have (@felixge might open an issue or proposal about what guarantees |
Yes, caching works and it's entirely transparent to the end user. It's just fairly hacky on the inside, so it took me some effort to figure out properly. I'm happy to do a video call if any of you would like a demo or have any questions about it. |
Looping in @RomainMuller and @Julio-Guerra who are working on our toolexec based instrumentation project https://github.com/DataDog/orchestrion. Romain is planning to file an issue to explain what we’re doing with -toolexec, why we’re doing it this way, and what challenges we face on the way, and for which we’d appreciate toolchain evolution. We might take you up on the offer for a meeting @mvdan 🙇 |
From
go help build
:-toolexec
is a very powerful flag to control or modify how Go's toolchain programs get run. The classic use case is for development of Go itself; for example, using a program to measure how much CPU and wall time each compiler and linker invocation takes, or to insert a debugger in a particular toolchain program execution.Issue #27628 already covers an existing problem: right now, the use of
-toolexec
does not invalidate the build cache, which is a problem in both of the examples above. We agree that that bug should be fixed, but consider it to be a separate issue.However, we argue that it would be reasonable for a toolexec program's output to be cached when the program is deterministic. The use case we have in mind is tools which alter a Go build predictably; for example, a code obfuscator which modifies the arguments passed to the compiler and linker in a deterministic and reproducible way. Without caching, such a tool is extremely slow, especially for big projects with many packages.
The way
go build
handles caching properly forgo tool compile
seems to be to first ask itcompile -V=full
, which shows the compiler's full version, and then use that as part of the cache key. This means that the "package build" operations will need to be run again if the compiler version changes or if the input changes, but not otherwise.We propose adding similar functionality when using
-toolexec
. For example, when runninggo build -toolexec=mytool
, the Go tool would run the tool with a well defined version flag likemytool -toolversion
, similar to the compiler's-V=full
. If it returns a non-zero exit code, we do no caching at all, which would align with the non-deterministic use cases in #27628. If the command succeeds, its output is added to the cache key, and proper build caching takes place.We think this is the best of both worlds, because current non-deterministic toolexec programs would continue to use no caching at all, while other Go-specific complex tools could take advantage of the build cache.
We should also define the bare minimum of information for a
-toolversion
flag to report.go tool compile -V=full
seems to include the toolchain's version as well as the compiler's package build ID:We realise that this proposal is somewhat far-fetched, and the easy answer might be "
-toolexec
was never meant for this kind of use". However, we argue that it opens up a very exciting world of possibilities in Go tooling, especially with tools that closely integrate with the toolchain itself to alter builds.#29430 is a related proposal, but it also has a very different objective. It proposes an entirely new interface for code rewriting tools, which is yet to be defined. We simply propose to improve the existing
-toolexec
interface in a very specific and backwards-compatible way. We also think both features could coexist; not all build-altering tools will simply need to alter source code - for example, the code obfuscator shown above also enforces some linker flags like-w -s
./cc @bcmills @cherrymui @ianlancetaylor @josharian @jayconrod @ianthehat
The text was updated successfully, but these errors were encountered: