Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go,cmd/link: TestScript/build_issue48319 and TestScript/build_plugin_reproducible failing on LUCI gotip-darwin-amd64-longtest builder due to non-reproducible LC_UUID #64947

Open
prattmic opened this issue Jan 3, 2024 · 41 comments
Assignees
Labels
Builders x/build issues (builders, bots, dashboards) compiler/runtime Issues related to the Go compiler and/or runtime. GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Milestone

Comments

@prattmic
Copy link
Member

prattmic commented Jan 3, 2024

(It is unclear to me if this is an issue with the test, cmd/go, the compiler/linker, or the builder itself)

Example failure: https://ci.chromium.org/ui/inv/build-8759926960216361809/test-results?sortby=&groupby=

Both tests are failing because they aren't getting a reproducible build.

    script_test.go:156: FAIL: testdata/script/build_issue48319.txt:29: cmp -q main.exe main1.exe: main.exe and main1.exe differ
    script_test.go:156: FAIL: testdata/script/build_plugin_reproducible.txt:6: cmp -q a.so b.so: a.so and b.so differ

I haven't yet been able to reproduce on a gomote because the LUCI gomote setup doesn't currently set up Xcode properly, so cgo doesn't work (which these tests require).

cc @bcmills @dmitshur @mknyszek @cagedmantis

@prattmic prattmic added Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go labels Jan 3, 2024
@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

Workaround to get Xcode on a gomote:

Note: Depending on which machine you get, the mac_toolchain binary referenced below may be at either /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain or /Volumes/Work/s/w/ir/tools/bin/mac_toolchain.

$ gomote run mpratt-gotip-darwin-amd64-longtest-0 /bin/mkdir /tmp/xcode
$ gomote run mpratt-gotip-darwin-amd64-longtest-0 /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain install -xcode-version 15a240d -output-dir /tmp/xcode/Xcode.app
$ gomote run mpratt-gotip-darwin-amd64-longtest-0 /usr/bin/sudo xcode-select --switch /tmp/xcode/Xcode.app

@bcmills
Copy link
Contributor

bcmills commented Jan 3, 2024

Might have something to do with code-signing? (But then why aren't those tests failing on the darwin-amd64-longtest legacy TryBots too?)

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

With Xcode installed, this (thankfully) does reproduce (no pun intended):

$ gomote run mpratt-gotip-darwin-amd64-longtest-0 ./go/bin/go test -run=TestScript/build_plugin_reproducible -v cmd/go  
# Streaming results from "mpratt-gotip-darwin-amd64-longtest-0" to "/tmp/gomote2019819704/mpratt-gotip-darwin-amd64-longtest-0.stdout"...
=== RUN   TestScript
vcs-test.golang.org rerouted to http://127.0.0.1:50941
https://vcs-test.golang.org rerouted to https://127.0.0.1:50942
go test proxy running at GOPROXY=http://127.0.0.1:50943/mod
=== RUN   TestScript/build_plugin_reproducible
=== PAUSE TestScript/build_plugin_reproducible
=== CONT  TestScript/build_plugin_reproducible
    script_test.go:132: 2024-01-03T19:36:00Z
    script_test.go:134: $WORK=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168
    script_test.go:156: 
        PATH=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/testbin:/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/go/bin:/Users/swarming/.swarming/w/ir/tools/bin:/Users/swarming/.swarming/cipd_cache/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin
        HOME=/no-home
        CCACHE_DISABLE=1
        GOARCH=amd64
        TESTGO_GOHOSTARCH=amd64
        GOCACHE=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/gocache
        GOCOVERDIR=
        GODEBUG=
        GOEXE=
        GOEXPERIMENT=
        GOOS=darwin
        TESTGO_GOHOSTOS=darwin
        GOPROXY=http://127.0.0.1:50943/mod
        GOPRIVATE=
        GOROOT=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/go
        GOROOT_FINAL=
        GOTRACEBACK=system
        TESTGO_GOROOT=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/go
        TESTGO_EXE=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/testbin/go
        TESTGO_VCSTEST_HOST=127.0.0.1:50941
        TESTGO_VCSTEST_TLS_HOST=127.0.0.1:50942
        TESTGO_VCSTEST_CERT=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/vcstest63272679/cert.pem
        TESTGONETWORK=panic
        GOSUMDB=localhost.localdev/sumdb+00000c67+AcTrnkbUA+TU4heY3hkjiSES/DSQniBqIeQ/YppAUtK6
        GONOPROXY=
        GONOSUMDB=
        GOVCS=*:all
        devnull=/dev/null
        goversion=1.22
        CMDGO_TEST_RUN_MAIN=true
        HGRCPATH=
        GOTOOLCHAIN=auto
        newline=
        
        WORK=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168
        TMPDIR=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168/tmp
        GOPATH=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168/gopath
        PWD=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168/gopath/src
        
        > [!buildmode:plugin] skip
        [condition not met]
        > [short] skip
        [condition not met]
        > go build -trimpath -buildvcs=false -buildmode=plugin -o a.so main.go
        > go build -trimpath -buildvcs=false -buildmode=plugin -o b.so main.go
        > cmp -q a.so b.so
    script_test.go:156: FAIL: testdata/script/build_plugin_reproducible.txt:6: cmp -q a.so b.so: a.so and b.so differ
--- FAIL: TestScript (0.10s)
    --- FAIL: TestScript/build_plugin_reproducible (8.78s)
FAIL
FAIL    cmd/go  9.089s
FAIL
# Wrote results from "mpratt-gotip-darwin-amd64-longtest-0" to "/tmp/gomote2019819704/mpratt-gotip-darwin-amd64-longtest-0.stdout".
Error running run: unable to execute ./go/bin/go: rpc error: code = Unknown desc = command execution failed: exit status 1

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

Complete recipe:

Note: Depending on which machine you get, the mac_toolchain binary referenced below may be at either /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain or /Volumes/Work/s/w/ir/tools/bin/mac_toolchain.

$ export GOROOT=/home/prattmic/src/go/ # set to your GOROOT
$ export GOMOTELUCI=true
$ gomote create gotip-darwin-amd64-longtest
mpratt-gotip-darwin-amd64-longtest-1
$ export INSTANCE=mpratt-gotip-darwin-amd64-longtest-1
$ gomote run ${INSTANCE} /bin/mkdir /tmp/xcode
$ gomote run ${INSTANCE} /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain install -xcode-version 15a240d -output-dir /tmp/xcode/Xcode.app
$ gomote run ${INSTANCE} /usr/bin/sudo xcode-select --switch /tmp/xcode/Xcode.app
$ gomote push ${INSTANCE}
$ gomote run ${INSTANCE} ./go/src/make.bash
$ gomote run ${INSTANCE} ./go/bin/go test -run=TestScript/build_plugin_reproducible -v cmd/go

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

The only differences between a.so and b.so are something near the beginning of the file (still investigating) and the Go Build ID:

diff -C 5 a.hex b.hex
*** a.hex       Wed Jan  3 12:03:42 2024
--- b.hex       Wed Jan  3 12:03:47 2024
***************
*** 117,128 ****
  00000740: 0b00 0000 5000 0000 0000 0000 a70b 0000  ....P...........
  00000750: a70b 0000 5603 0000 fd0e 0000 3300 0000  ....V.......3...
  00000760: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  00000770: 0000 0000 0000 0000 6048 1700 5801 0000  ........`H..X...
  00000780: 0000 0000 0000 0000 0000 0000 0000 0000  ................
! 00000790: 1b00 0000 1800 0000 edfb 2d7d ab6e 374d  ..........-}.n7M
! 000007a0: 8eba 7c75 012c c264 3200 0000 2000 0000  ..|u.,.d2... ...
  000007b0: 0100 0000 0000 0e00 0000 0e00 0100 0000  ................
  000007c0: 0300 0000 0007 f703 2a00 0000 1000 0000  ........*.......
  000007d0: 0000 0000 0000 0000 0c00 0000 3800 0000  ............8...
  000007e0: 1800 0000 0200 0000 0000 3805 0000 0100  ..........8.....
  000007f0: 2f75 7372 2f6c 6962 2f6c 6962 5379 7374  /usr/lib/libSyst
--- 117,128 ----
  00000740: 0b00 0000 5000 0000 0000 0000 a70b 0000  ....P...........
  00000750: a70b 0000 5603 0000 fd0e 0000 3300 0000  ....V.......3...
  00000760: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  00000770: 0000 0000 0000 0000 6048 1700 5801 0000  ........`H..X...
  00000780: 0000 0000 0000 0000 0000 0000 0000 0000  ................
! 00000790: 1b00 0000 1800 0000 f0cb 7393 b5bd 3b76  ..........s...;v
! 000007a0: 9fb5 5f03 dd32 4c8a 3200 0000 2000 0000  .._..2L.2... ...
  000007b0: 0100 0000 0000 0e00 0000 0e00 0100 0000  ................
  000007c0: 0300 0000 0007 f703 2a00 0000 1000 0000  ........*.......
  000007d0: 0000 0000 0000 0000 0c00 0000 3800 0000  ............8...
  000007e0: 1800 0000 0200 0000 0000 3805 0000 0100  ..........8.....
  000007f0: 2f75 7372 2f6c 6962 2f6c 6962 5379 7374  /usr/lib/libSyst
***************
*** 916,928 ****
  00003930: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  00003940: ff20 476f 2062 7569 6c64 2049 443a 2022  . Go build ID: "
  00003950: 4c42 3648 7a64 376b 6c31 6258 726e 7948  LB6Hzd7kl1bXrnyH
  00003960: 697a 5859 2f70 2d7a 3839 4146 354e 6136  izXY/p-z89AF5Na6
  00003970: 6f31 736e 4466 704a 682f 3644 745f 4f44  o1snDfpJh/6Dt_OD
! 00003980: 4769 7571 452d 7652 4e52 7831 5878 2f66  GiuqE-vRNRx1Xx/f
! 00003990: 4735 6f64 7563 424e 6d4f 7053 6455 4e51  G5oducBNmOpSdUNQ
! 000039a0: 7861 5522 0a20 ffcc cccc cccc cccc cccc  xaU". ..........
  000039b0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  000039c0: 5548 89e5 4883 ec10 4c8b 3dd1 4607 0049  UH..H...L.=.F..I
  000039d0: 8b4f 084c 8b3d c646 0700 498b 170f 1f00  .O.L.=.F..I.....
  000039e0: 4839 c87d 1b73 3948 c1e0 0448 8b0c 0248  H9.}.s9H...H...H
  000039f0: 8b5c 0208 4889 c848 83c4 105d c30f 1f00  .\..H..H...]....
--- 916,928 ----
  00003930: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  00003940: ff20 476f 2062 7569 6c64 2049 443a 2022  . Go build ID: "
  00003950: 4c42 3648 7a64 376b 6c31 6258 726e 7948  LB6Hzd7kl1bXrnyH
  00003960: 697a 5859 2f70 2d7a 3839 4146 354e 6136  izXY/p-z89AF5Na6
  00003970: 6f31 736e 4466 704a 682f 3644 745f 4f44  o1snDfpJh/6Dt_OD
! 00003980: 4769 7571 452d 7652 4e52 7831 5878 2f6d  GiuqE-vRNRx1Xx/m
! 00003990: 436f 5971 6470 5854 386a 7a54 6e64 6d4f  CoYqdpXT8jzTndmO
! 000039a0: 3038 5022 0a20 ffcc cccc cccc cccc cccc  08P". ..........
  000039b0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  000039c0: 5548 89e5 4883 ec10 4c8b 3dd1 4607 0049  UH..H...L.=.F..I
  000039d0: 8b4f 084c 8b3d c646 0700 498b 170f 1f00  .O.L.=.F..I.....
  000039e0: 4839 c87d 1b73 3948 c1e0 0448 8b0c 0248  H9.}.s9H...H...H
  000039f0: 8b5c 0208 4889 c848 83c4 105d c30f 1f00  .\..H..H...]....

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

Based on the otool output, it looks like this other component is the LC_UUID value: EDFB2D7D-AB6E-374D-8EBA-7C75012CC264 vs F0CB7393-B5BD-3B76-9FB5-5F03DD324C8A.

I don't know MachO very well, but it seems that this is just another build ID...

@bcmills
Copy link
Contributor

bcmills commented Jan 3, 2024

Huh. See also https://bugs.chromium.org/p/chromium/issues/detail?id=1068970. 😵‍💫

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

The output of GODEBUG=gocachehash=1 is identical for both builds.

@bcmills
Copy link
Contributor

bcmills commented Jan 3, 2024

Yeah, looks like the LC_UUID depends on at least the last component of the output file path:
https://github.com/apple-opensource/ld64/blame/e28c028b20af187a16a7161d89e91868a450cadc/src/ld/OutputFile.cpp#L3724-L3733

I'm not sure what _options.buildContextName() is derived from.
Looks like maybe the RC_RELEASE environment variable?
(https://github.com/apple-opensource/ld64/blob/e28c028b20af187a16a7161d89e91868a450cadc/src/ld/Options.cpp#L4529-L4530C30)

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

Thanks for the reference! Looking at the go build -x output, the last few steps are:

GOROOT_FINAL='$GOROOT' /Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/link -o a.out.so -importcfg $WORK/b001/importcfg.link -installsuffix dynlink -pluginpath plugin/unnamed-bf82aa353b25c4b8a6ab19fdb37f3d07a25be28e -buildmode=plugin -buildid=LB6Hzd7kl1bXrnyHizXY/p-z89AF5Na6o1snDfpJh/6Dt_ODGiuqE-vRNRx1Xx/LB6Hzd7kl1bXrnyHizXY -extld=clang $WORK/b001/_pkg_.a
/Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/buildid -w $WORK/b001/exe/a.out.so # internal
mv $WORK/b001/exe/a.out.so b.so

Running the link step (first line) multiple times, even without changing the path, yields a .so with different LC_UUID each time:

$ /Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/link -o a.out.so -importcfg $WORK/b001/importcfg.link -installsuffix dynlink -pluginpath plugin/unnamed-bf82aa353b25c4b8a6ab19fdb37f3d07a25be28e -buildmode=plugin -buildid=LB6Hzd7kl1bXrnyHizXY/p-z89AF5Na6o1snDfpJh/6Dt_ODGiuqE-vRNRx1Xx/LB6Hzd7kl1bXrnyHizXY -extld=clang $WORK/b001/_pkg_.a
$ shasum5.30 a.out.so 
9029867749bbecd942c0037526ababa6f0d83932  a.out.so
$ /Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/link -o a.out.so -importcfg $WORK/b001/importcfg.link -installsuffix dynlink -pluginpath plugin/unnamed-bf82aa353b25c4b8a6ab19fdb37f3d07a25be28e -buildmode=plugin -buildid=LB6Hzd7kl1bXrnyHizXY/p-z89AF5Na6o1snDfpJh/6Dt_ODGiuqE-vRNRx1Xx/LB6Hzd7kl1bXrnyHizXY -extld=clang $WORK/b001/_pkg_.a
$ shasum5.30 a.out.so                                                                                                                                                                                                                                                                                                                                                        
d76b438421660ffd24d4ca06dc30b3150b0b9fee  a.out.so

Diffing the binary shows that the UUID is the only difference (Go Build ID is identical presumably because I'm not running the buildid command).

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

It doesn't seem to be related to the file paths. cmd/link invokes clang like so:

host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" "-dynamiclib" "-o" "a.out.so" "-Qunused-arguments" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/go.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000000.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000001.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000002.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000003.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000004.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000005.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000006.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000007.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000008.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000009.o" "-O2" "-g" "-lpthread"

The go-link-202179431 path component changes each iteration, but this can be forced to be the same with -tmpdir /tmp/tmp:

host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" "-dynamiclib" "-o" "a.out.so" "-Qunused-arguments" "/tmp/tmp/go.o" "/tmp/tmp/000000.o" "/tmp/tmp/000001.o" "/tmp/tmp/000002.o" "/tmp/tmp/000003.o" "/tmp/tmp/000004.o" "/tmp/tmp/000005.o" "/tmp/tmp/000006.o" "/tmp/tmp/000007.o" "/tmp/tmp/000008.o" "/tmp/tmp/000009.o" "-O2" "-g" "-lpthread"

Even with identical paths each time we get different output.

@bcmills
Copy link
Contributor

bcmills commented Jan 3, 2024

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

Perhaps, but I'd like to better understand what is happening. Plus it seems like some users may want the UUID, as Chrome did.

FWIW, the clang command consistently generates identical output from the same inputs. It seems it is the output of dsymutil that is differing:

host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" "-dynamiclib" "-o" "a.out.so" "-Qunused-arguments" "/tmp/tmp2/go.o" "/tmp/tmp2/000000.o" "/tmp/tmp2/000001.o" "/tmp/tmp2/000002.o" "/tmp/tmp2/000003.o" "/tmp/tmp2/000004.o" "/tmp/tmp2/000005.o" "/tmp/tmp2/000006.o" "/tmp/tmp2/000007.o" "/tmp/tmp2/000008.o" "/tmp/tmp2/000009.o" "-O2" "-g" "-lpthread"
host link dsymutil: "/tmp/xcode/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/dsymutil" "-f" "a.out.so" "-o" "/tmp/tmp2/go.dwarf"
host link strip: "/tmp/xcode/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/strip" "-S" "a.out.so"
$ for f in /tmp/tmp/*; do shasum5.30 $f; done
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp/000000.o
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp/000001.o
c85aae746d1a1f270a1cea350b377d1b5f9ff376  /tmp/tmp/000002.o
a4741bd785ce981348a908880917a43074de02a7  /tmp/tmp/000003.o
2050abddb774ad2600414caa5596608d5260424c  /tmp/tmp/000004.o
160fbd77666d9949ef8b8fa502ad1e665388dad3  /tmp/tmp/000005.o
0c4db5947508c4a8c035484c4ef97182efb572e1  /tmp/tmp/000006.o
e2724d0e3997897da123884a7dd2496d094bf45e  /tmp/tmp/000007.o
311978e69c428fbfc059f92eb48ae0a8e3e19d80  /tmp/tmp/000008.o
5fc69d54547370cba80b9d5272b62db3126ff85f  /tmp/tmp/000009.o
aa2a253f8abd8c84c3ef27ddfc9c12bc1481f277  /tmp/tmp/go.dwarf
480e9721586f5e764d59826677acff5d6bbe3588  /tmp/tmp/go.o
556b5a818027717b0399b1e94ba268ff147c932e  /tmp/tmp/trivial.c
$ for f in /tmp/tmp2/*; do shasum5.30 $f; done
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp2/000000.o
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp2/000001.o
c85aae746d1a1f270a1cea350b377d1b5f9ff376  /tmp/tmp2/000002.o
a4741bd785ce981348a908880917a43074de02a7  /tmp/tmp2/000003.o
2050abddb774ad2600414caa5596608d5260424c  /tmp/tmp2/000004.o
160fbd77666d9949ef8b8fa502ad1e665388dad3  /tmp/tmp2/000005.o
0c4db5947508c4a8c035484c4ef97182efb572e1  /tmp/tmp2/000006.o
e2724d0e3997897da123884a7dd2496d094bf45e  /tmp/tmp2/000007.o
311978e69c428fbfc059f92eb48ae0a8e3e19d80  /tmp/tmp2/000008.o
5fc69d54547370cba80b9d5272b62db3126ff85f  /tmp/tmp2/000009.o
3c3cf42192f5f6427f05b2dceca7c5733e6f1721  /tmp/tmp2/go.dwarf
480e9721586f5e764d59826677acff5d6bbe3588  /tmp/tmp2/go.o
556b5a818027717b0399b1e94ba268ff147c932e  /tmp/tmp2/trivial.c

(go.dwarf differs)

Edit: I'm not 100% certain about dsymutil being at fault here, as I can't seem to reproduce the non-reproducibility when running clang + dsymutil manually.

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

cc @thanm see #64947 (comment) for reproducer instructions

@bcmills bcmills changed the title cmd/go: TestScript/build_issue48319 and TestScript/build_plugin_reproducible failing continuously on LUCI gotip-darwin-amd64-longtest builder cmd/go: TestScript/build_issue48319 and TestScript/build_plugin_reproducible failing on LUCI gotip-darwin-amd64-longtest builder due to non-reproducible LC_UUID Jan 3, 2024
@bcmills bcmills added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 3, 2024
@bcmills bcmills changed the title cmd/go: TestScript/build_issue48319 and TestScript/build_plugin_reproducible failing on LUCI gotip-darwin-amd64-longtest builder due to non-reproducible LC_UUID cmd/go,cmd/link: TestScript/build_issue48319 and TestScript/build_plugin_reproducible failing on LUCI gotip-darwin-amd64-longtest builder due to non-reproducible LC_UUID Jan 3, 2024
@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

I spent a little while looking at this. What's weird is that the actual DWARF in the two go.dwarf files is identical-- what is different is (again) the uuid. E.g.

$  llvm-dwarfdump-16 xxx/tmpdir1/go.dwarf > dw1.txt
$  llvm-dwarfdump-16 xxx/tmpdir2/go.dwarf > dw2.txt
$ diff dw1.txt dw2.txt
1c1
< xxx/tmpdir1/go.dwarf:	file format Mach-O 64-bit x86-64
---
> xxx/tmpdir2/go.dwarf:	file format Mach-O 64-bit x86-64
$
$ llvm-objdump-16 --macho --all-headers xxx/tmpdir1/go.dwarf > h1.txt
$ llvm-objdump-16 --macho --all-headers xxx/tmpdir2/go.dwarf > h2.txt
$ diff h1.txt h2.txt
1c1
< xxx/tmpdir1/go.dwarf:
---
> xxx/tmpdir2/go.dwarf:
3880c3880
<     uuid 3BA8085B-DD85-312C-B9AD-2CEDAE928E62
---
>     uuid E559C1A0-DDFF-3BD3-8CD8-7652DC367F9F
$

So basically what seems to be happening is that dsymutil is generating a different uuid each time and embedding it into the go.dwarf file, in spite of the fact that the dwarf is the same, hmm.

I will spend a little time digging into the dsymutil source code, maybe I can find out more.

@prattmic
Copy link
Member Author

prattmic commented Jan 4, 2024

FWIW, the version of Xcode we're installing is 15.0.0. I peeked at the release notes for 15.0.1 and 15.1 and nothing stood out as a fix for this kind of issue, but I'll see if we can get a different version to test.

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

OK (duh) in fact dsymutil is just faithfully copying the uuid from its input, so the problem here is that clang is generating a different uuid. I'll look into the clang source code instead.

@prattmic
Copy link
Member Author

prattmic commented Jan 4, 2024

FWIW, it looks like there are more versions of Xcode available to try out, though I haven't tested them:

  • mac_toolchain install -xcode-version 15a240d: 15.0
  • mac_toolchain install -xcode-version 15A507: 15.0.1
  • mac_toolchain install -xcode-version 15C65: 15.1
  • mac_toolchain install -xcode-version 15C5500c: 15.2 (beta, I guess)

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

OK, I think I am making some progress here. For a while I thought this might be an ld-prime problem, but that turned out to be a red herring. In fact it looks like it is a bit simpler than that.

Running the link with -ldflags=-v -tmpdir=/tmp/tmp I see

# command-line-arguments
HEADER = -H1 -T0x1001000 -R0x1000
host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load"
 "-dynamiclib" "-o" "/Users/swarming/.swarming/w/itvprlhos9/workdir-swarming-task/tmp/go-build2833294421/b001/exe/a.out.so" "-Qunused-arguments" "/tmp/tmp/go.o" "/tmp/tmp/000000.o" 
"/tmp/tmp/000001.o" "/tmp/tmp/000002.o" "/tmp/tmp/000003.o" "/tmp/tmp/000004.o" "/tmp/tmp/000005.o" 
"/tmp/tmp/000006.o" "/tmp/tmp/000007.o" "/tmp/tmp/000008.o" "-O2" "-g" "-lpthread" "-ld64"

Note the "-o" output, which incorporates the go build dir go-build2833294421, which is going to vary from build to build. The problem is that this is being incorporated into the dynamic info in the a.out.so output, e.g. from the output of llvm-objdump-16 --macho --all-headers I see:

Load command 4
          cmd LC_ID_DYLIB
      cmdsize 128
         name /Users/swarming/.swarming/w/itvprlhos9/workdir-swarming-task/tmp/go-build1516003297/b001/exe/a.out.so (offset 24)
   time stamp 1 Thu Jan  1 00:00:01 1970
      current version 0.0.0
compatibility version 0.0.0

and the external linker is almost certainly going to hash this section when creating the build ID.

Not sure what the best approach is to fix this. Also not sure why we aren't seeing similar problems with the older gomotes (I will spin one up and compare).

@bcmills
Copy link
Contributor

bcmills commented Jan 4, 2024

@thanm, that sounds very similar to an existing reproducibility workaround here:
https://cs.opensource.google/go/go/+/master:src/cmd/go/internal/work/gc.go;l=649-663;drc=66b8107a26e515bbe19855d358bdf12bd6326347

Perhaps we need to extend that workaround to more build modes, or take a similar approach when running other commands?

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

Well phooey, I am afraid I've had a Homer Simpson moment here.

My gomote expired, and I created a new one, but when I started using the new one I didn't update the PATH setting in my script, so it wasn't picking up the correct version of Go. It looks like with LUCI gomotes the location of GOROOT is slightly different each time:

bindir from my first gomote: "/Users/swarming/.swarming/w/ituz4dfd04/workdir-swarming-task/go/bin"
bindir from my second gomote: "/Users/swarming/.swarming/w/itvprlhos9/workdir-swarming-task/go/bin"

Oh well, a learning experience I suppose.

That explains why I was not picking up Cherry's fix (https://go-review.googlesource.com/c/go/+/478196, which extends the workaround that you mention Bryan).

Now I'm back to seeing only a difference in the UUID.

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

One more important bit of info: problem goes away if I build with -extldflags=-ld_classic, meaning that this may be another thing we can add to the long list of problems that crop up with "ld-prime" (e.g. issue #61229).

Looking at the setup we have on our old-style gomotes I see:

$ gomote run `cat mote.txt` softwareupdate --history

Display Name                                       Version    Date                  
------------                                       -------    ----                  
Command Line Tools for Xcode                       14.0       11/07/2022, 16:16:24  
Command Line Tools for Xcode                       14.1       11/07/2022, 16:16:24

e.g. command line tools, not a complete Xcode installation. For the new LUCI gomotes we are obviously a full Xcode install, and we're using version 15, which defaults to ld-prime.

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

FWIW, it looks like there are more versions of Xcode available to try out, though I haven't tested them:

  • mac_toolchain install -xcode-version 15a240d: 15.0
  • mac_toolchain install -xcode-version 15A507: 15.0.1
  • mac_toolchain install -xcode-version 15C65: 15.1
  • mac_toolchain install -xcode-version 15C5500c: 15.2 (beta, I guess)

I just tested the most recent one (15.2) and it appears to have the same problem. Hmph.

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

OK, one more update. I can reproduce the problem with just the C compiler, and what I think must be going on is that the name of the output file is being incorporated into the UUID. If I do:

$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o a.so example.cpp
$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o b.so example.cpp
$ llvm-objdump-16 --macho --all-headers a.so > bsh.txt
$ llvm-objdump-16 --macho --all-headers b.so > bsh.txt

then I see a difference, whereas if I instead do

$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o a.so example.cpp
mv a.so b.so
$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o a.so example.cpp
$ llvm-objdump-16 --macho --all-headers a.so > bsh.txt
$ llvm-objdump-16 --macho --all-headers b.so > bsh.txt

The UUIDs are the same (the only thing different in the second example is that both builds target a.so).

How would we feel about changing the test in question to target the same filename? Or does the current ld-prime behavior not really meet our criteria for reproducible builds?

@bcmills
Copy link
Contributor

bcmills commented Jan 4, 2024

Huh. I guess it would be ok for the tests to mv the output file so that they can go build -o to the same filename, although that seems a bit subtle.

Does the LC_UUID depend only on the output file's basename, or on the directory path as well? I think it's probably ok for it to depend on the basename, but (especially if the user is building with -trimpath) we should ensure that it doesn't depend on the current working directory.

@thanm
Copy link
Contributor

thanm commented Jan 5, 2024

Does the LC_UUID depend only on the output file's basename

I checked just now and it looks like it is just the output file basename, not the directory. If I run the C compiler building example.cpp once in directory xxx, then do the same compile in directory yyy, I get identical binaries. I'll send a CL, although I agree it is a bit weird.

@thanm
Copy link
Contributor

thanm commented Jan 5, 2024

I poked a bit at the other failure (TestScript/build_issue48319). That one looks like it will require another Go command fix -- since this is not a shared-mode build the "-o" argument being passed to the external linker is a full path. Hence the difference in build IDs.

@bcmills would be make sense to take the code you mentioned before (https://cs.opensource.google/go/go/+/master:src/cmd/go/internal/work/gc.go;l=649-663;drc=66b8107a26e515bbe19855d358bdf12bd6326347) and extend it even farther (e.g. any link being done on Darwin)?

@gopherbot
Copy link

Change https://go.dev/cl/554059 mentions this issue: cmd/go/testdata: tweak build_plugin_reproducible test for Xcode 15

@cherrymui
Copy link
Member

If it is only a test issue, and user's normal "go build" (be default, without any extra weird flags) is still reproducible, I think it is okay to just update the test. Another option is that we (over)write the LC_UUID in the Go linker after C linking, based on the file content (or the Go build ID). (We overwrite the binary for DWARF combining anyway, but that may be changed with #62577.)

For another issue, if it is not a shared object there would be no LC_ID_DYLIB, so it is still the UUID that is affected by the output file path?

@bcmills
Copy link
Contributor

bcmills commented Jan 9, 2024

If it is only a test issue, and user's normal "go build" (be default, without any extra weird flags) is still reproducible, I think it is okay to just update the test.

If I understand correctly, it's in an awkward grey area: the build is “reproducible” but only if you specify an output filename with the same basename at each invocation.

That is:

$ go build -trimpath -o foo
$ mv foo bar
$ go build -trimpath -o foo

will produce a foo identical to bar, but

$ go build -trimpath -o foo
$ go build -trimpath -o bar

will not.

Since -trimpath is supposed to redact local filenames, and the name of the output file is arguably a local filename, that technically fails reproducibility. On the other hand, it is still the case that repeating exactly the same command — provided that the -o flag is also the same — should continue to produce the same (reproducible) output bytes independent of the working directory.

@cherrymui
Copy link
Member

We could have the Go linker to always pass, say a.out to the C linker, and rename the file after.

@cherrymui
Copy link
Member

Perhaps the most reliable way is to just write the UUID ourselves. For short term (and possibly backport), we can work around it in the test.

@thanm
Copy link
Contributor

thanm commented Jan 9, 2024

Circling back on this: when I ran the experiment using clang (described in #64947 (comment)) I assumed that the same fix would work for "go build", but in fact it looks like the UUIDs are still different, so I think there is still some work to do here. Apologies, I have been busy working on other bugs, I will spend some more time on it later this afternoon.

@thanm
Copy link
Contributor

thanm commented Jan 10, 2024

Tiny bit more progress:

I hacked the Go linker code to save off a copy of the original "a.out.so" produced by the Apple linker before it gets run through dsymutil and then strip. When I compare the two instances of the original a.out.so (e.g. before being stripped) I can see differences in the symbol table output:

+ diff xxx/asvsh.txt xxx/bsvsh.txt
1c1
< xxx/tmpdir1/a.out.so.save:
---
> xxx/tmpdir2/a.out.so.save:
3311c3311
< 00000000659ee137      d  *UND* /private/tmp/tmp/go.o
---
> 00000000659ee145      d  *UND* /private/tmp/tmp/go.o
12706c12706
< 00000000659ee137      d  *UND* /private/tmp/tmp/000004.o
---
> 00000000659ee145      d  *UND* /private/tmp/tmp/000004.o
12714c12714
< 00000000659ee137      d  *UND* /private/tmp/tmp/000005.o
---
> 00000000659ee145      d  *UND* /private/tmp/tmp/000005.o

So this explains why we have this bizarre situation where the only thing different in the final binary is the build ID.

These phantom symbols are also really weird, too: their value doesn't seem to be meaningful at all (normally an undef symbol has a zero value?), and doesn't correspond to anything in the section table. I will see if I can reproduce this same weirdness with a pure C example -- if this works then we can file a bug against the apple linker.

@thanm
Copy link
Contributor

thanm commented Jan 10, 2024

OK, I have successfully reproduced the problem with a C program via:

$ rm -rf /tmp/tmp
$ mkdir /tmp/tmp
$ clang -arch x86_64 -m64 -dynamiclib -O2 -g -c -o /tmp/tmp/obj1.o a.cpp
$ clang -arch x86_64 -m64 -dynamiclib -O2 -g -c -o /tmp/tmp/obj2.o b.cpp
$ clang -arch x86_64 -m64 -dynamiclib "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" -o a.out.so /tmp/tmp/obj1.o /tmp/tmp/obj2.o
$ mv a.out.so a.so
$
$ rm -rf /tmp/tmp
$ mkdir /tmp/tmp
$ clang -arch x86_64 -m64 -dynamiclib -O2 -g -c -o /tmp/tmp/obj1.o a.cpp
$ clang -arch x86_64 -m64 -dynamiclib -O2 -g -c -o /tmp/tmp/obj2.o b.cpp
$ clang -arch x86_64 -m64 -dynamiclib "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" -o a.out.so /tmp/tmp/obj1.o /tmp/tmp/obj2.o
$ mv a.out.so b.so

Diffing the objdump output from a.so and b.so above produces

1c1
< a.so:
---
> b.so:
17c17
< 00000000659ee84c      d  *UND* /private/tmp/tmp/obj1.o
---
> 00000000659ee84d      d  *UND* /private/tmp/tmp/obj1.o
31c31
< 00000000659ee84c      d  *UND* /private/tmp/tmp/obj2.o
---
> 00000000659ee84d      d  *UND* /private/tmp/tmp/obj2.o
234c234
<     uuid 78168A31-07CA-3F90-B84D-A352B00AD9AE
---
>     uuid 6D77FC7A-5C55-3D9E-BCBF-46C334F27478
(END)

I think Cherry's idea of rewriting the build ID is looking a bit more attractive at this point.

@mknyszek mknyszek added this to the Backlog milestone Jan 10, 2024
@cherrymui
Copy link
Member

Well, it seems the values of those symbols aren't really arbitrary:

$ objdump -t a.out | grep a.o
a.out:	file format mach-o 64-bit x86-64
0000000065a02667      d  *UND* /private/tmp/pp/a.o
$ date -r 0x0000000065a02667
Thu Jan 11 12:33:27 EST 2024
$ stat a.o
16777220 270066454 -rw-r--r-- 1 cherryyz wheel 0 1856 "Jan 11 12:37:37 2024" "Jan 11 12:33:27 2024" "Jan 11 12:33:27 2024" "Jan 11 12:33:27 2024" 4096 8 0 a.o

It it actually the timestamp of the .o file (!)...

I guess one way to work around it is to zero the timestamps of the .o files before feeding it to the C linker...

@bcmills
Copy link
Contributor

bcmills commented Jan 11, 2024

Maybe try setting ZERO_AR_DATE=1 in the environment?
(Just a guess based on https://github.com/apple-opensource/ld64/blob/e28c028b20af187a16a7161d89e91868a450cadc/src/ld/Options.cpp#L4513.)

@dmitshur dmitshur modified the milestones: Backlog, Go1.23 Feb 11, 2024
@prattmic
Copy link
Member Author

Should we add skips for these tests while we figure out how to work around this?

@thanm
Copy link
Contributor

thanm commented Feb 20, 2024

Should we add skips for these tests while we figure out how to work around this?

Probably a good idea. I sent a CL (565376).

@gopherbot
Copy link

Change https://go.dev/cl/565376 mentions this issue: cmd/go/testdata/script: add darwin skips for selected buildrepro tests

gopherbot pushed a commit that referenced this issue Feb 20, 2024
Skip two build reproducibility tests (build_issue48319 and
build_plugin_reproducible) on Darwin if GO_BUILDER_NAME is set until
issue 64947 can be resolved; on the LUCI darwin longtest builder the
more contemporary version of Xcode is doing things that are unfriendly
to Go's build reproducibility.

For #64947.

Change-Id: Iebd433ad6dfeb84b6504ae9355231d897d8ae174
Reviewed-on: https://go-review.googlesource.com/c/go/+/565376
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@bcmills
Copy link
Contributor

bcmills commented Mar 14, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) compiler/runtime Issues related to the Go compiler and/or runtime. GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Projects
Status: In Progress
Development

No branches or pull requests

7 participants