Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: TestDWARF*/testprog is flaky on darwin/amd64 #32218

Open
josharian opened this issue May 24, 2019 · 10 comments
Open

cmd/link: TestDWARF*/testprog is flaky on darwin/amd64 #32218

josharian opened this issue May 24, 2019 · 10 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@josharian
Copy link
Contributor

At b84e0bc, on darwin/amd64:

$ go test -run=TestDWARF/testprog -count=10 cmd/link
--- FAIL: TestDWARFiOS (0.53s)
    --- FAIL: TestDWARFiOS/testprog#03 (2.03s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.50s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.78s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.52s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.96s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.83s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.53s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.98s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprog#02 (1.99s)
        dwarf_test.go:96: exit status 255
FAIL
FAIL	cmd/link	47.577s
FAIL

I've also seen

--- FAIL: TestDWARF (0.22s)
    --- FAIL: TestDWARF/testprog (3.00s)
        dwarf_test.go:96: exit status 255
FAIL

The failures are intermittent.

cc @dr2chase

@josharian josharian added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels May 24, 2019
@josharian josharian added this to the Go1.13 milestone May 24, 2019
@cherrymui
Copy link
Member

Happened for me as well, with exactly same error. I commented on CL https://go-review.googlesource.com/c/go/+/170377

@odeke-em
Copy link
Member

Running on c11f6c4

go version devel +c11f6c4929 Fri Jun 21 21:24:47 2019 +0000 darwin/amd64

uname

uname -a
Darwin Emmanuels-MacBook-Pro-2.local 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

On the first try I get a failure but this time spectacularly with a 1.2MB log whose beginning is something like

--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprogcgo#01 (2.33s)
        dwarf_test.go:96: symbols testprogcgo.exe: exit status 255: /var/folders/v3/7z434qpx5v3bw0wh8h2myfpw0000gn/T/go-link-TestDWARF414534277/testprogcgo.exe [arm64, 0.004386 seconds]:
                A829102B-EE24-329C-874B-9BAAE2337C79 /private/var/folders/v3/7z434qpx5v3bw0wh8h2myfpw0000gn/T/go-link-TestDWARF414534277/testprogcgo.exe [AOUT, PIE, FaultedFromDisk]
                    0x0000000000000000 (0x100000000) __PAGEZERO SEGMENT
                    0x0000000100000000 ( 0xbc000) __TEXT SEGMENT
                        0x0000000100000000 (  0x4c90) MACH_HEADER
                        0x0000000100004c90 ( 0xb67a4) __TEXT __text
                            0x0000000100004c90 (    0x68) runtime.text [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100004cf8 (    0x40) internal/cpu.Initialize [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100004d38 (   0x620) internal/cpu.processOptions [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005358 (    0x50) internal/cpu.indexByte [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x00000001000053a8 (   0x680) internal/cpu.doinit [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a28 (    0x10) type..hash.internal/cpu.CacheLinePad [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a38 (    0x10) type..eq.internal/cpu.CacheLinePad [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a48 (    0x50) type..hash.internal/cpu.arm64 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a98 (    0x60) type..eq.internal/cpu.arm64 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005af8 (    0x70) type..hash.internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005b68 (    0xa0) type..eq.internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005c08 (    0x50) type..hash.internal/cpu.x86 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005c58 (    0x50) type..eq.internal/cpu.x86 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005ca8 (    0x70) type..hash.[24]internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
....
<FRAMES_INTENTIONALLY_ELIDED_BY_MYSELF_SINCE_IT_IS_A_LARGE_FILE>
                            0x00000001001e3d20 (    0x10) syscall.envOnce [NameNList, MangledNameNList, NList]
                            0x00000001001e3d30 (    0x10) time.localOnce [NameNList, MangledNameNList, NList]
                            0x00000001001e3d40 (    0x10) runtime.writeBarrier [NameNList, MangledNameNList, NList]
                            0x00000001001e3d50 (    0x20) syscall.ForkLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3d70 (    0x30) syscall.envLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3da0 (    0x20) runtime.hashkey [NameNList, MangledNameNList, NList]
                            0x00000001001e3dc0 (    0x40) runtime.newmHandoff [NameNList, MangledNameNList, NList]
                            0x00000001001e3e00 (    0x40) runtime.sig [NameNList, MangledNameNList, NList]
                            0x00000001001e3e40 (    0x40) runtime.execLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3e80 (    0x40) runtime.debug [NameNList, MangledNameNList, NList]
                            0x00000001001e3ec0 (    0x40) runtime.finptrmask [NameNList, MangledNameNList, NList]
                            0x00000001001e3f00 (    0x80) runtime.aeskeysched [NameNList, MangledNameNList, NList]
                            0x00000001001e3f80 (    0x80) runtime.handlingSig [NameNList, MangledNameNList, NList]
                            0x00000001001e4000 (    0xa0) internal/cpu.X86 [NameNList, MangledNameNList, NList]
                            0x00000001001e40a0 (    0xa0) internal/cpu.ARM64 [NameNList, MangledNameNList, NList]
                            0x00000001001e4140 (    0xa0) runtime.gcController [NameNList, MangledNameNList, NList]
                            0x00000001001e41e0 (   0x100) runtime.fwdSig [NameNList, MangledNameNList, NList]
                            0x00000001001e42e0 (   0x200) runtime.printBacklog [NameNList, MangledNameNList, NList]
                            0x00000001001e44e0 (   0x400) runtime.zeroVal [NameNList, MangledNameNList, NList]
                            0x00000001001e48e0 (  0x1758) runtime.memstats [NameNList, MangledNameNList, NList]
                    0x00000001001e8000 ( 0x7c000) __LINKEDIT SEGMENT
            symbols[65885]: [fatal]  child process status could not be determined; exit code unavailable.
FAIL
exit status 1
FAIL    cmd/link        47.568s

with the output at https://gist.github.com/odeke-em/85bb51ded80d9239a97c435f875b04c5

@aclements
Copy link
Member

Some notes:

As far as I can tell, this has never happened on the dashboard based on greplogs -E 'dwarf_test.*(exit status 255|symbols)' -dashboard.

CL 171819 added better logging for this (previously we didn't print the symbols output). That's why the original report just has "exit status 255", but @odeke-em's report has much more information.

Unfortunately, it seems like the symbols tool may just be flaky. Just searching around for that error reveals other people seeing it occasionally in various macOS development tools outside the context of Go.

Could someone with a Mac try just running symbols (or maybe the test) in a loop to see how often it flakes?

@dr2chase
Copy link
Contributor

It may require Darwin 18.5, which is one back from current.
I've tried running it in a loop with luck at seeing the bug, will try some more.
(Maybe a busy Mac helps?)

@dr2chase
Copy link
Contributor

About 2000 repetitions on Darwin 18.6 (aka Mojave, macOS 10.14.5), with no failure, on a busy-ish machine. Still running.

@dmitshur dmitshur added OS-Darwin Testing An issue that has been verified to require only test changes, not just a test failure. labels Jul 16, 2019
@ianlancetaylor
Copy link
Contributor

In the output above the symbols program completes, but then reports an error symbols[65885]: [fatal] child process status could not be determined; exit code unavailable. I can see the error string if I run strings on /usr/bin/symbols, so it's definitely coming from there. I can't find any reports of this error on the Interwebs (the top search result for me is in fact this issue). I don't know why the symbols program would start a child process at all. I don't know why it would be unable to determine the status of that child process. I can't find the source code of the symbols program; I don't know if it is available at all.

Moving to 1.14 but if we can't figure out a reproducer, or get the source code, I doubt there is much that we can do.

@ianlancetaylor
Copy link
Contributor

Is this still happening for anyone?

@cherrymui
Copy link
Member

I haven't seen this for quite a while, since upgraded to macOS 10.14. I only saw it on macOS 10.13 in the past. My guess is that there may be some problem with the symbols command in earlier version.

@9nut
Copy link

9nut commented Jan 13, 2020

FYI, this might be an issue with xcode requiring an update. Here's why I think that:

When I tried to build 1.13.6 this morning I got the following:

--- FAIL: TestDWARF (0.19s)
    --- FAIL: TestDWARF/testprog (1.17s)
        dwarf_test.go:96: symbols testprog.exe: exit status 255: /var/folders/j0/pr17cml548j19x25h2h0dv9h0000gp/T/go-link-TestDWARF355434644/testprog.exe [x86_64, 0.001705 seconds]:
                null-uuid                            /private/var/folders/j0/pr17cml548j19x25h2h0dv9h0000gp/T/go-link-TestDWARF355434644/testprog.exe [AOUT, FaultedFromDisk]  
....

I updated my machine to macOS Catalina (10.15.2) last week, and had not started xcode in a while. I guessed I might need to update xcode. After starting xcode -- which required an update -- Go 1.13.6 was able to build without any issues.

@gopherbot
Copy link

Change https://golang.org/cl/297469 mentions this issue: cmd/link: disable flaky Darwin "symbols" test

gopherbot pushed a commit that referenced this issue Mar 3, 2021
About one run out of 3 it fails on my laptop, and
I am tired of having to be a nanny for my tests just
because of this one flaky test.  This has been a problem
for months.

Updates #32218.

Change-Id: I2871d4c6f47e9432d189ed7bdcda8f9c0871cfc5
Reviewed-on: https://go-review.googlesource.com/c/go/+/297469
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

9 participants