Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go support on Reproducible Builds #57120

Closed
rsc opened this issue Dec 6, 2022 · 38 comments
Closed

Go support on Reproducible Builds #57120

rsc opened this issue Dec 6, 2022 · 38 comments
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Dec 6, 2022

Over at #57001 (comment), @Foxboron wrote:

Quick point while I contemplate if it's worth engaging on this topic as the Arch maintainer for the go package.

One option is to trust in Go's high-fidelity, reproducible builds and let the go command fetch the dependencies directly. I would hope that systems that take this approach are also comfortable letting the go command fetch any toolchain dependency as well, since the toolchain fetches have the same high-fidelity, reproducible behavior as module dependency fetches.

It's a very big difference between downloading sources files defined in the go.mod files and fetching binary files files from some remote location. We are all very aware of the trusting trust attack and moving the reproducible builds requirements from the downstream distributor (Linux distributions) to the upstream (Google) is not trivial.

So how is Google going to provide Reproducible Builds for the downloaded toolchains?


Then I wrote:

@Foxboron, regarding "Reproducible Builds", by that do you mean https://reproducible-builds.org/? And if so what is involved in "providing" one? As of Go 1.21 we expect our toolchains will be fully reproducible even when cross-compiling. (That is, if you build a Mac toolchain on Windows, Linux, and Mac, you get the same bits out in all cases.) I would be delighted to have a non-Google project reproducing our builds in some way.


Then @Foxboron replied:

regarding "Reproducible Builds", by that do you mean https://reproducible-builds.org/?

Yes. I have been working on this project since 2017 for Arch Linux.

And if so what is involved in "providing" one?

If this gets implemented we would be downloading binary toolchains, right? I want to reproduce the binaries distributed by Google.

Just checking out the source and building versions won't necessarily be enough, so there needs to be some attestation or SBOMs published to support the distribution of the binaries.

I'm not saying this can't be done. I'm just trying to point how the bar between the "reproducible builds" Go already facilitates with source code is very different from what you would need to ensure for binary builds.

I would be delighted to have a non-Google project reproducing our builds in some way.

I'm not sure if "our builds" is the distributed binaries from Google? But Arch has been publishing verifiable builds of the Go compiler for 2 or 3 years now.


Moving this conversation to a new issue.

@rsc rsc added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 6, 2022
@rsc rsc added this to the Unreleased milestone Dec 6, 2022
@rsc rsc self-assigned this Dec 6, 2022
@Foxboron
Copy link
Contributor

Foxboron commented Dec 6, 2022

Lets be clear that I don't actually know how reproducible any binary artifacts from Go actually is at the moment. I'm just expressing that things need to be more rigorous when dealing with binary artifacts.

@seankhliao
Copy link
Member

see also #24904

@rsc
Copy link
Contributor Author

rsc commented Dec 6, 2022

Just checking out the source and building versions won't necessarily be enough, so there needs to be some attestation or SBOMs published to support the distribution of the binaries.

When you say "won't necessarily be enough", I assume you mean it won't produce the same bits, and that the extra attestation or SBOMs would be needed because they contain the extra build configuration information to get the same bits out. That's absolutely true today, and I don't think there's too much benefit to chasing a reproduction of releases before Go 1.21. Right now the directory where the build is run leaks into the binaries, and we ship pre-built compiled archives that have other leakage, and we ship two binaries that are built in part using the host C compiler, which has its own leakage. Starting in Go 1.20 we are dropping the pre-compiled archives, and in Go 1.21 we will drop the use of the host C compiler, at which point Go controls all the bits that are generated, and there are just a few steps to make them truly reproducible, namely cut out the build directory root and avoid using backslashes in partial file paths on windows.

All that work is pending to land once Go 1.21 development begins. At that point the distributions will be really, truly, reproducible from only the source commit. The bootstrap toolchain doesn't leak into the distribution and as of Go 1.21 neither will the directory where the build happens, nor which operating system ran the build. At that point, anyone should be able to check out the go1.21 tag in the repo, grab a new enough bootstrap toolchain (Go 1.21 will require Go 1.17 or later, same as Go 1.20 does), run the build, and get bit-for-bit identical results.

If there are attestations or SBOMs required to support some kind of process, I'd be happy to look into that, but it won't be necessary to reproduce the bits.

Go binaries have always been highly reproducible on a single machine environment (fixed build directory, architectures, host C compiler), because we use build input content hashes to identify up-to-date-ness. If the build is not reproducible locally, the hashes don't converge. The most common way this would happen is if some detail of the bootstrap toolchain leaked into the compiler binary, so that building itself once and building itself twice produce different results. That convergence is tested in every toolchain build, so we shake those out as soon as they creep in. It's been quite a while since the last one. What will be new in Go 1.21 is removing the "single machine environment" limitation.

@rsc
Copy link
Contributor Author

rsc commented Dec 6, 2022

Thanks for the pointer @seankhliao. Added that issue number to my pending CL and also marked that issue for Go 1.21.

@Foxboron
Copy link
Contributor

Foxboron commented Dec 6, 2022

If there are attestations or SBOMs required to support some kind of process, I'd be happy to look into that, but it won't be necessary to reproduce the bits.

I'll be happy to test and validate any reproducability claims the Go binary disitribution is making. I have spent quite a bit of time working on these sort of issues.

Obviously cgo and the external linker is a harder target for reproducability

#53528

@rsc
Copy link
Contributor Author

rsc commented Dec 6, 2022

Indeed, cgo and the external linker is a much harder target.
The plan is to stop using cgo to build the Go distribution itself (#57007).

@rsc
Copy link
Contributor Author

rsc commented Dec 6, 2022

I'll be happy to test and validate any reproducability claims the Go binary disitribution is making. I have spent quite a bit of time working on these sort of issues.

Thanks very much. We will open Go 1.21 development in February. I'll ping this issue once there is something to try.

@cherrymui
Copy link
Member

Sorry if this has been discussed elsewhere.

When we talk about "reproducible build", do we want to specify what exactly are considered as input, and what are not? For example, the program's source code is an input, so are the source code version, the toolchain version, some configurations like the target GOOS and GOARCH. The current date and time are non-inputs. What about the host GOOS and GOARCH, the source code location and toolchain location, some other environment variables, etc.?

On a narrow definition, "reproducible build" could be interpreted as building the same program twice with the exactly same configuration, gets the exactly same output. In that sense the last category could be considered as input. On a stronger definition, the last category are probably not. This issue maybe is to eliminate the last category as input. It may be good to specify it more clearly.

@dolmen
Copy link
Contributor

dolmen commented Feb 7, 2023

@cherrymui
I think that a good target for reproducible build is to be able to fully reproduce (bit-to-bit) a binary just from the knowledge of the output of go version -m being applied to it.

@rsc
Copy link
Contributor Author

rsc commented Feb 7, 2023

This issue is specifically about reproducible builds for the Go toolchain distributed on https://go.dev/dl, not for arbitrary Go binaries. For that context, the relevant inputs are a Go source tree with a VERSION file, a GOOS, and a GOARCH. The host GOOS/GOARCH does not matter - the goal is to be able to reproduce builds no matter which OS compiled them. Other environment variables like CGO_ENABLED, CC, and so on matter but are left unset by our toolchain generation, so we can ignore them for reproducing the official downloads.

@rsc
Copy link
Contributor Author

rsc commented Feb 7, 2023

@dolmen For arbitrary binaries, (1) you need to compile them with -trimpath or else put the source in the same directory as it was built with, and (2) you need to compile with CGO_ENABLED=0 or else arrange to have exactly the same C compiler and C libraries. If you can satisfy those two conditions, and then you use the go version output to get the right toolchain and Go source files, then you should get a reproducible build. That's not what this issue is about though. (For the Go toolchain itself we compile the commands with -trimpath and CGO_ENABLED=0.)

@rsc
Copy link
Contributor Author

rsc commented Mar 6, 2023

@Foxboron, I posted https://swtch.com/tmp/go1.21repro4.src.tar.gz with a source tree containing the changes for reproducible builds for the upcoming Go 1.21 release (still in development). If you build it using the standard process (./make.bash) you should get the same binaries that are in https://swtch.com/tmp/go1.21repro4.linux-amd64.tar.gz or substitute a different GOOS-GOARCH in that URL. If you use ./make.bash -distpack you should get in ../pkg/distpack the exact archive at that URL. Like any Go toolchain build, the process requires a sufficiently new Go bootstrap toolchain (Go 1.17.13 or later) in $GOROOT_BOOTSTRAP (default $HOME/sdk/go1.17.13 or $HOME/go1.17.13 or $HOME/go1.4, whichever exists). There are no other requirements of the host system.

You said earlier that you'd be happy to test and validate any reproducibility claims. Can you check that you can reproduce that build? And assuming you can reproduce this specific distribution, what is the process for adding Go to Reproducible Builds once the official Go 1.21 is released?

@Foxboron
Copy link
Contributor

Foxboron commented Mar 6, 2023

@rcs, It will be a couple of days before I'll look at this. Currently recovering from a fever.

Adding the Go project to reproducible-builds.org is just a matter of adding it to the homepage. https://salsa.debian.org/reproducible-builds/reproducible-website

@Foxboron
Copy link
Contributor

Foxboron commented Mar 10, 2023

Building the above source with ./make.bash using the Go compiler shipped with Arch (2:1.20.1-1) produces the same checksum as the binaries from the go1.21repro4.linux-amd64.tar.gz archive.

λ bin » sha256sum *
c400a53988aaf4dbbf31cb2f1adef839457e996209b3ce86d239c12acf72d270  go
2767fa3b986d6a1799ee8d6340790595a2bb88d77670ceeba0abbd348826f124  gofmt

Whats the plan to ensure there are no regression between releases?

@Foxboron
Copy link
Contributor

I can probably also run this through a few toolbox images if you want me to check multiple distributions.

@rsc
Copy link
Contributor Author

rsc commented Mar 13, 2023

@Foxboron, thanks for confirming that you can reproduce the build. That's great. I'm not too worried about testing lots of other distributions, especially since we can reproduce that go1.21repro4.linux-amd64.tar.gz from Windows and macOS too.

Our current thinking for avoiding regressions in releases is to build releases on two fairly different machines (e.g., a Linux machine and a Windows machines) and confirm that they match before issuing a release.

When I look at https://reproducible-builds.org/citests/, it appears that the top bunch are running regular tests on infrastructure run by the Reproducible Builds project. Once Go 1.21 is released (or at least go1.21rc1 is out), would it make sense for us to prepare a small repo containing a script that could be run on that infrastructure to reproduce the archives posted on https://go.dev/dl/? We could run it ourselves and be listed under "External tests" of course, but it seems like running on non-Google-owned infrastructure would be a stronger statement. What do you think?

@h01ger
Copy link

h01ger commented Mar 13, 2023

really great to read up on this issue and see the progress! kudos & thank you.

one tiny comment from my side:

@rcs, It will be a couple of days before I'll look at this. Currently recovering from a fever.

Adding the Go project to reproducible-builds.org is just a matter of adding it to the homepage. https://salsa.debian.org/reproducible-builds/reproducible-website

and in there one file needs to be edited: _data/projects.yml, where it just needs a YAML entry like
eg this one for F-Droid .

I'd either happily merge a MR or take the data from this issue ;)

@h01ger
Copy link

h01ger commented Mar 13, 2023

oh, and for testing on https://reproducible-builds.org/citests/ it's automated and the easiest if you do a release which then get's updated into Debian or Arch Linux or OpenSUSE.

@Foxboron
Copy link
Contributor

Once Go 1.21 is released (or at least go1.21rc1 is out), would it make sense for us to prepare a small repo containing a script that could be run on that infrastructure to reproduce the archives posted on https://go.dev/dl/? We could run it ourselves and be listed under "External tests" of course, but it seems like running on non-Google-owned infrastructure would be a stronger statement. What do you think?

You could run it on the github CI/CD infra on each release? That + the google infra would be a nice statement to begin with.

I'll probably write my own monitor for this, and then it might be worth to try host something on reproducible-builds.org in the future.

@rsc
Copy link
Contributor Author

rsc commented Mar 14, 2023

I like the cron-based GitHub Actions idea. Thanks.

@quite
Copy link

quite commented Apr 11, 2023

@dolmen For arbitrary binaries, (1) you need to compile them with -trimpath or else put the source in the same directory as it was built with, and (2) you need to compile with CGO_ENABLED=0 or else arrange to have exactly the same C compiler and C libraries. If you can satisfy those two conditions, and then you use the go version output to get the right toolchain and Go source files, then you should get a reproducible build. That's not what this issue is about though. (For the Go toolchain itself we compile the commands with -trimpath and CGO_ENABLED=0.)

I'd like to add that in addition to this, -buildvcs=false seems to be needed (or else some info from any VCS gets opportunistically baked into the binary right).

@newhinton
Copy link

Sorry to intrude in this discussion, does this mean that as of go v.1.19 reproducible builds are not possible while cross compiling? I am trying to create a reproducible android app that includes rclone, but i dont know how to make the go-output reproducible. If this is not the right place, please point me in the right direction. Thanks!

@bcmills
Copy link
Contributor

bcmills commented May 18, 2023

@newhinton, reproducible builds are possible in general by building with CGO_ENABLED=0 and -trimpath. However, note that on android platforms other than android/arm64, we currently require an external linker — the resulting Go binaries will only be reproducible if the system C linker is at a fixed version and supports reproducible builds.

@newhinton
Copy link

Ah okay! I am using CGO_ENABLED=1 that might explain it. I will have to figure out how to do it without.

we currently require an external linker

Will this change with v.1.21 as stated by the original post?

@rsc
Copy link
Contributor Author

rsc commented May 19, 2023

No, the changes in Go 1.21 are focused on builds of the main Go toolchain not builds of other programs. I doubt very much that Android will work with internal linking any time soon. Does the Android C toolchain not support reproducible builds?

@newhinton
Copy link

newhinton commented May 19, 2023

Does the Android C toolchain not support reproducible builds?

I assume so, but to be fair, this is the first time for me creating a reproducible build. The weird thing to me is that while the env-vars are the same on my two build environments, (including the clang-version which is supplied by android/google itself and version-locked) i get two different binaries, and one contains a .hash section while the other does not. I am entirely unsure where i go from here, but i understand that this might not be the right place. Any help is appreciated though!

Edit: We found the issue! I did not pass the linker-options properly, now it works and is reproducible! Thanks for your help!

@agambier
Copy link

agambier commented Jul 3, 2023

@newhinton Could you share the linker-options you used ? Thanks

@gopherbot
Copy link

Change https://go.dev/cl/513975 mentions this issue: cmd/gorebuild: add tool to reproduce posted Go binaries

@gopherbot
Copy link

Change https://go.dev/cl/513700 mentions this issue: _content: add rebuild page with reproducible build information

@newhinton
Copy link

@agambier
You can see the configuration in the corresponding gradle file:

https://github.com/newhinton/Round-Sync/blob/master/rclone/build.gradle

gopherbot pushed a commit to golang/build that referenced this issue Aug 2, 2023
This command rebuilds or verifies all the artifacts posted on
go.dev/dl for the latest supported releases (the last patch of
the last two major releases, plus the most recent release candidate
if we're approaching a new release).

It is meant to be run by the Go team to update a status page
that can be linked from reproducible-builds.org, but it is also
meant to be run by anyone who wants to "trust but verify" the
status page themselves.

For golang/go#57120.
For golang/go#58884.

Change-Id: I80a70275c1821a66b6219d24f29c2d11bfe464a8
Reviewed-on: https://go-review.googlesource.com/c/build/+/513975
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/515415 mentions this issue: cmd/gorebuild: check uid/gid/uname/gname/mtime fields in tgz files

@gopherbot
Copy link

Change https://go.dev/cl/515356 mentions this issue: cmd/gorebuild: add gorebuild version to report

gopherbot pushed a commit to golang/build that referenced this issue Aug 2, 2023
Issue 61513 is resolved so this path can be turned on now.
Confirmed to still pass now that go1.21rc4 is out. It was
the first release built using improvements from CL 512437.

For golang/go#57120.
For golang/go#58884.
For golang/go#61513.

Change-Id: Ie39765f8c7ba514dea2bfccf7c8ef8acc5822a22
Reviewed-on: https://go-review.googlesource.com/c/build/+/515415
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Aug 3, 2023
For golang/go#57120.

Change-Id: Ic741fe1d856a9d853f25288ce29ad40a289653ef
Reviewed-on: https://go-review.googlesource.com/c/build/+/515356
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
gopherbot pushed a commit to golang/website that referenced this issue Aug 3, 2023
We now have a command to reproduce Go builds posted on go.dev/dl.
Add a dashboard that people can check to see its results.
We should be able to link to this page from https://reproducible-builds.org/citests/.

For golang/go#57120.
For golang/go#58884.

Change-Id: I0bd1f9c26a9a003aa1f301125083195fdeb048b4
Reviewed-on: https://go-review.googlesource.com/c/website/+/513700
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/515455 mentions this issue: cmd/golangorg: fix CachedURL, update rebuild template

gopherbot pushed a commit to golang/website that referenced this issue Aug 3, 2023
The "not modified" response code is 304, not 206. Oops.
Use named constants to avoid similar mistakes in the future.

Also update rebuild template to show more version information.

For golang/go#57120.
For golang/go#58884.

Change-Id: I2c3ddf25cede0b5a853fa971226463a997f168c7
Reviewed-on: https://go-review.googlesource.com/c/website/+/515455
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
@rsc
Copy link
Contributor Author

rsc commented Aug 3, 2023

Update on this: https://go.dev/rebuild exists now, and I sent https://salsa.debian.org/reproducible-builds/reproducible-website/-/merge_requests/98 to add it to the Reproducible Builds web site.

@Foxboron
Copy link
Contributor

Foxboron commented Aug 3, 2023

Cool work on this. Would be interesting if other compiler developers would follow up with something similar :)

@rsc
Copy link
Contributor Author

rsc commented Aug 3, 2023

This has been merged, and Go is now listed on https://reproducible-builds.org/who/projects/ and https://reproducible-builds.org/citests/.

@rsc rsc closed this as completed Aug 3, 2023
@gopherbot
Copy link

Change https://go.dev/cl/517515 mentions this issue: cmd/gorebuild: drop "@" prefix in defaultVersions

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 9, 2023
gopherbot pushed a commit to golang/build that referenced this issue Aug 9, 2023
When arguments are provided to gorebuild, the "@" character can be used
to specify a version. Otherwise version selection happens automatically
via defaultVersions. Its output are Go versions, no need for any prefix.

Fixes the error preventing gorebuild from running when versions are not
explicitly provided via arguments:

	$ gorebuild
	18:05:05.812 downloaded https://go.dev/dl/?mode=json&include=all
	18:05:05.836 FAIL: unknown version "@go1.21.0"

For golang/go#57120.

Change-Id: I050bd9d6d12d89b6891c845e686326c87eae5716
Reviewed-on: https://go-review.googlesource.com/c/build/+/517515
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/556075 mentions this issue: cmd/rebuild: report files missing from posted archive better

gopherbot pushed a commit to golang/build that referenced this issue Jan 19, 2024
The "missing from posted archive" case was checking the wrong variable
and could never trigger. Fortunately, it's fairly harmless, as missing
files would still be caught by gorebuild thanks to check hitting a nil
pointer dereference trying to compare the missing file.

Check the right variable to fix the panic, and print the intended text.

For golang/go#57120.
For golang/go#58884.

Change-Id: I4560a9cc6c53bca37283c004826d728e175a1ff1
Reviewed-on: https://go-review.googlesource.com/c/build/+/556075
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests