New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go,cmd/link: TestScript/build_issue48319 and TestScript/build_plugin_reproducible failing on LUCI gotip-darwin-amd64-longtest builder due to non-reproducible LC_UUID #64947
Comments
Workaround to get Xcode on a gomote: Note: Depending on which machine you get, the
|
Might have something to do with code-signing? (But then why aren't those tests failing on the |
With Xcode installed, this (thankfully) does reproduce (no pun intended):
|
Complete recipe: Note: Depending on which machine you get, the
|
The only differences between a.so and b.so are something near the beginning of the file (still investigating) and the Go Build ID:
|
Based on the otool output, it looks like this other component is the I don't know MachO very well, but it seems that this is just another build ID... |
Huh. See also https://bugs.chromium.org/p/chromium/issues/detail?id=1068970. 😵💫 |
The output of |
Yeah, looks like the I'm not sure what |
Thanks for the reference! Looking at the
Running the link step (first line) multiple times, even without changing the path, yields a
Diffing the binary shows that the UUID is the only difference (Go Build ID is identical presumably because I'm not running the buildid command). |
It doesn't seem to be related to the file paths. cmd/link invokes clang like so:
The
Even with identical paths each time we get different output. |
Maybe we can just set the |
Perhaps, but I'd like to better understand what is happening. Plus it seems like some users may want the UUID, as Chrome did. FWIW, the
( Edit: I'm not 100% certain about dsymutil being at fault here, as I can't seem to reproduce the non-reproducibility when running clang + dsymutil manually. |
cc @thanm see #64947 (comment) for reproducer instructions |
I spent a little while looking at this. What's weird is that the actual DWARF in the two
So basically what seems to be happening is that dsymutil is generating a different uuid each time and embedding it into the go.dwarf file, in spite of the fact that the dwarf is the same, hmm. I will spend a little time digging into the dsymutil source code, maybe I can find out more. |
FWIW, the version of Xcode we're installing is 15.0.0. I peeked at the release notes for 15.0.1 and 15.1 and nothing stood out as a fix for this kind of issue, but I'll see if we can get a different version to test. |
OK (duh) in fact dsymutil is just faithfully copying the uuid from its input, so the problem here is that clang is generating a different uuid. I'll look into the clang source code instead. |
FWIW, it looks like there are more versions of Xcode available to try out, though I haven't tested them:
|
OK, I think I am making some progress here. For a while I thought this might be an Running the link with
Note the "-o" output, which incorporates the go build dir
and the external linker is almost certainly going to hash this section when creating the build ID. Not sure what the best approach is to fix this. Also not sure why we aren't seeing similar problems with the older gomotes (I will spin one up and compare). |
@thanm, that sounds very similar to an existing reproducibility workaround here: Perhaps we need to extend that workaround to more build modes, or take a similar approach when running other commands? |
Well phooey, I am afraid I've had a Homer Simpson moment here. My gomote expired, and I created a new one, but when I started using the new one I didn't update the PATH setting in my script, so it wasn't picking up the correct version of Go. It looks like with LUCI gomotes the location of GOROOT is slightly different each time:
Oh well, a learning experience I suppose. That explains why I was not picking up Cherry's fix (https://go-review.googlesource.com/c/go/+/478196, which extends the workaround that you mention Bryan). Now I'm back to seeing only a difference in the UUID. |
One more important bit of info: problem goes away if I build with Looking at the setup we have on our old-style gomotes I see:
e.g. command line tools, not a complete Xcode installation. For the new LUCI gomotes we are obviously a full Xcode install, and we're using version 15, which defaults to ld-prime. |
I just tested the most recent one (15.2) and it appears to have the same problem. Hmph. |
OK, one more update. I can reproduce the problem with just the C compiler, and what I think must be going on is that the name of the output file is being incorporated into the UUID. If I do:
then I see a difference, whereas if I instead do
The UUIDs are the same (the only thing different in the second example is that both builds target How would we feel about changing the test in question to target the same filename? Or does the current ld-prime behavior not really meet our criteria for reproducible builds? |
Huh. I guess it would be ok for the tests to Does the |
I checked just now and it looks like it is just the output file basename, not the directory. If I run the C compiler building example.cpp once in directory xxx, then do the same compile in directory yyy, I get identical binaries. I'll send a CL, although I agree it is a bit weird. |
I poked a bit at the other failure (TestScript/build_issue48319). That one looks like it will require another Go command fix -- since this is not a shared-mode build the "-o" argument being passed to the external linker is a full path. Hence the difference in build IDs. @bcmills would be make sense to take the code you mentioned before (https://cs.opensource.google/go/go/+/master:src/cmd/go/internal/work/gc.go;l=649-663;drc=66b8107a26e515bbe19855d358bdf12bd6326347) and extend it even farther (e.g. any link being done on Darwin)? |
Change https://go.dev/cl/554059 mentions this issue: |
If it is only a test issue, and user's normal "go build" (be default, without any extra weird flags) is still reproducible, I think it is okay to just update the test. Another option is that we (over)write the LC_UUID in the Go linker after C linking, based on the file content (or the Go build ID). (We overwrite the binary for DWARF combining anyway, but that may be changed with #62577.) For another issue, if it is not a shared object there would be no LC_ID_DYLIB, so it is still the UUID that is affected by the output file path? |
If I understand correctly, it's in an awkward grey area: the build is “reproducible” but only if you specify an output filename with the same basename at each invocation. That is:
will produce a
will not. Since |
We could have the Go linker to always pass, say |
Perhaps the most reliable way is to just write the UUID ourselves. For short term (and possibly backport), we can work around it in the test. |
Circling back on this: when I ran the experiment using clang (described in #64947 (comment)) I assumed that the same fix would work for "go build", but in fact it looks like the UUIDs are still different, so I think there is still some work to do here. Apologies, I have been busy working on other bugs, I will spend some more time on it later this afternoon. |
Tiny bit more progress: I hacked the Go linker code to save off a copy of the original "a.out.so" produced by the Apple linker before it gets run through dsymutil and then strip. When I compare the two instances of the original a.out.so (e.g. before being stripped) I can see differences in the symbol table output:
So this explains why we have this bizarre situation where the only thing different in the final binary is the build ID. These phantom symbols are also really weird, too: their value doesn't seem to be meaningful at all (normally an undef symbol has a zero value?), and doesn't correspond to anything in the section table. I will see if I can reproduce this same weirdness with a pure C example -- if this works then we can file a bug against the apple linker. |
OK, I have successfully reproduced the problem with a C program via:
Diffing the objdump output from a.so and b.so above produces
I think Cherry's idea of rewriting the build ID is looking a bit more attractive at this point. |
Well, it seems the values of those symbols aren't really arbitrary:
It it actually the timestamp of the .o file (!)... I guess one way to work around it is to zero the timestamps of the .o files before feeding it to the C linker... |
Maybe try setting |
Should we add skips for these tests while we figure out how to work around this? |
Probably a good idea. I sent a CL (565376). |
Change https://go.dev/cl/565376 mentions this issue: |
Skip two build reproducibility tests (build_issue48319 and build_plugin_reproducible) on Darwin if GO_BUILDER_NAME is set until issue 64947 can be resolved; on the LUCI darwin longtest builder the more contemporary version of Xcode is doing things that are unfriendly to Go's build reproducibility. For #64947. Change-Id: Iebd433ad6dfeb84b6504ae9355231d897d8ae174 Reviewed-on: https://go-review.googlesource.com/c/go/+/565376 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
See previously: |
(It is unclear to me if this is an issue with the test, cmd/go, the compiler/linker, or the builder itself)
Example failure: https://ci.chromium.org/ui/inv/build-8759926960216361809/test-results?sortby=&groupby=
Both tests are failing because they aren't getting a reproducible build.
I haven't yet been able to reproduce on a gomote because the LUCI gomote setup doesn't currently set up Xcode properly, so cgo doesn't work (which these tests require).
cc @bcmills @dmitshur @mknyszek @cagedmantis
The text was updated successfully, but these errors were encountered: