build: misc/cgo/testcshared failure #23784

Invizory · 2018-02-12T00:03:06Z

What version of Go are you using (`go version`)?

I'm using go version go1.9.4 linux/amd64 to build the sources from the latest master, i.e. go version devel +829b64c1ea linux/amd64.

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/inviz/go"
GORACE=""
GOROOT="/usr/lib/go-1.9"
GOTOOLDIR="/usr/lib/go-1.9/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build762069868=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

cd go/src
./all.bash

What did you expect to see?

ALL TESTS PASSED

What did you see instead?

Building Go cmd/dist using /usr/lib/go-1.9.
Building Go toolchain1 using /usr/lib/go-1.9.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for linux/amd64.

##### Testing packages.

...

##### ../misc/cgo/testcshared
--- FAIL: TestUnexportedSymbols (2.95s)
        cshared_test.go:349: run: [go build -buildmode=c-shared -installsuffix testcshared -o libgo2.so libgo2]
        cshared_test.go:363: run: [gcc -fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build486143289=/tmp/go-build -gno-record-gcc-switches -I pkg/linux_amd64_shared -o testp2 main2.c -Wl,--no-as-needed libgo2.so]
        cshared_test.go:369: command failed: [./testp2]
                exit status 2
                BUG: failed to read any data from pipe

FAIL

This test is very unstable, though, and may fail or may pass with approximate probability of 1/2.

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2018-02-12T05:06:32Z

That test runs these commands from the misc/cgo/testcshared directory:

GOPATH=`pwd` go build -buildmode=c-shared -installsuffix testcshared -o libgo2.so libgo2
gcc -o testp2 main2.c -Wl,--no-as-needed libgo2.so
LD_LIBRARY_PATH=. ./testp2

(Running those commands will leave the files libgo2.so and testp2 behind, you'll want to clean them up.)

Can you confirm that running those commands by themselves demonstrates the same failure.

ianlancetaylor · 2018-02-12T05:08:15Z

It's possible that the test will run the wrong version of go; which go command is first on your PATH?

Invizory · 2018-02-12T07:26:48Z

@ianlancetaylor,

It's possible that the test will run the wrong version of go; which go command is first on your PATH?

$ which go
/usr/lib/go-1.9/bin/go
$ go version
go version go1.9.4 linux/amd64

This version is taken from ppa:gophers/archive.

Can you confirm that running those commands by themselves demonstrates the same failure.

Yes, it gives:

$ LD_LIBRARY_PATH=. ./testp2
BUG: failed to read any data from pipe

See also strace output.

ianlancetaylor · 2018-02-12T19:34:28Z

Thanks for the strace output. That was going to be my next question. But what we need is actually the output from strace -f. Can you show that? Thanks.

Invizory · 2018-02-12T23:52:05Z

@ianlancetaylor, oh, you're right, here it is: strace -f -v -s 100 output.

ianlancetaylor · 2018-02-13T00:41:37Z

Thanks. That is strange. I see this:

[pid 18664] pipe([3, 4])                = 0
[pid 18664] dup2(3, 100 <unfinished ...>

and then the call to dup2 never completes, despite many calls to nanosleep for 1000000 nanoseconds. The next PID 18664 shows up is at the very end of the trace when all the processes are exiting.

[pid 18664] <... dup2 resumed>)         = ?
[pid 18664] +++ exited with 2 +++

What could possibly cause dup2 to hang like that? Are you using a FUSE file system of any sort?

Invizory · 2018-02-13T16:20:37Z

@ianlancetaylor,

No, there are no FUSE file systems mounted inside my home directory (mount).

I've also noticed that increasing the number of iterations in misc/cgo/testcshared/main2.c from 1000 to 10000 helps to PASS the test, i.e. dup2 resumes in the libgo2 thread after more iterations.

ianlancetaylor · 2018-02-13T20:20:06Z

Do you have an unusually slow system? 1000 iterations of a sleep for 1000000 nanoseconds is a full second. How could it possibly take a full second for dup2 to complete? It's not like the program is doing anything, it's mostly sleeping. Can you show strace -ftt output?

Invizory · 2018-02-13T21:30:09Z

@ianlancetaylor,

Do you have an unusually slow system?

I have 32×2.2GHz CPU cores with small load average: 0.82, 1.25, 1.12.

Can you show strace -ftt output?

I'm confused because now I can't reproduce the bug even with 1000 iterations. However, I reproduced it with 300 iterations, here is the strace -ftt output.

Invizory · 2018-02-13T21:56:12Z

@ianlancetaylor,

Well, I have reproduced it with 1000 iterations again after 340 more runs of ./testp2, see strace -ftt.

ianlancetaylor · 2018-02-13T22:03:27Z

Thanks. That output does clearly show that dup2 did not complete for more than one second, even though the program spent most of its time doing a loop of nothing but

[pid 12133] 00:48:36.838609 nanosleep({tv_sec=0, tv_nsec=1000000}, NULL) = 0
[pid 12133] 00:48:36.839734 read(100, 0x7fff18a156e0, 20) = -1 EBADF (Bad file descriptor)

By comparison, on my laptop, I see

[pid 238401] 14:01:39.650477 dup2(3, 100 <unfinished ...>
...
[pid 238401] 14:01:39.705113 <... dup2 resumed> ) = 100

so the dup2 call completes in much less than 1/10 second.

I'm grasping at straws, but can you try adding

    sched_yield();

before the call to nanosleep in misc/cgo/testcshared/main2.c?

Invizory · 2018-02-13T22:57:21Z

@ianlancetaylor,

Since it's very difficult to reproduce the bug now (for unknown reasons), I've collected some statistics instead on how much iterations it takes for different versions of code to finish with PASS (sorted raw data).

Here is the histogram for n = 5000 runs, the X-axis (tail stripped) denotes the number of iterations to finish, the Y-axis denotes the number of runs.

.

See also five extremal values:

without sched_yield: 9, 10, 11, 11, 12, ..., 258, 263, 274, 281, 392.
with sched_yield: 11, 11, 12, 12, 12, ..., 256, 259, 272, 334, 340.

So, according to the above mentioned, I don't see any significant difference at all.

gopherbot · 2018-02-14T05:16:19Z

Change https://golang.org/cl/93895 mentions this issue: misc/cgo/testcshared: increase sleep in TestUnexportedSymbols

ianlancetaylor · 2018-02-14T05:16:31Z

Well, I don't know what's going on, but it sounds like an approach like https;//golang.org/cl/93895 will at least make it more likely to pass.

Invizory · 2018-02-14T06:12:41Z

@ianlancetaylor,

Yep, seems like a reasonable approach.

aclements · 2018-04-20T17:01:56Z

This is still failing:

$ greplogs -dashboard -e 'TestUnexportedSymbols' -md -l
2018-03-29T06:09:52-ac7c0ee/linux-386
2018-04-11T16:06:01-e1040d7/linux-386-clang
2018-04-17T14:39:08-58b18cf/linux-amd64
2018-04-17T23:46:11-7ba12c1/linux-386-387

Trybot failures that just happened:
https://storage.googleapis.com/go-build-log/1247948c/linux-amd64_75439f19.log
https://storage.googleapis.com/go-build-log/7c12bc2a/linux-amd64-race_9f0c9a0a.log

ianlancetaylor · 2018-04-20T21:42:35Z

The test currently uses file descriptor 100, which requires the process's fdtable to be expanded. Somehow that seems to take a long time. The other threads running (the main thread written in C and the sysmon thread) are somehow blocking that. I don't have enough kernel-fu to understand how. On my system the dup2 call does block for about 0.03 seconds, while the other threads are busy sleeping in loops. I guess that somehow, sometimes, the call blocks for over 2 seconds.

I'll try a different workaround, which is to only dup to file descriptor 30. That is less than the Linux kernel's default fdtable size of 32 or 64, so the dup2 call now completes immediately without blocking at all.

gopherbot · 2018-04-20T21:44:39Z

Change https://golang.org/cl/108537 mentions this issue: misc/cgo/testcshared: use file descriptor 30 for TestUnexportedSymbols

FiloSottile · 2018-05-07T14:54:15Z

@ianlancetaylor this breaks the trybots on the 1.10 release branch as well, and is probably (?) also an issue on 1.9, can you look into backporting the fix?

https://go-review.googlesource.com/c/go/+/111715#message-c7b7c5f6de1d1ad25c8c7db15599e99e29a392a7

@gopherbot please open a backport issue for 1.10 and 1.9.

gopherbot · 2018-05-07T14:54:22Z

Backport issue(s) opened: #25277 (for 1.10), #25278 (for 1.9).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

gopherbot · 2018-05-07T22:58:00Z

Change https://golang.org/cl/111995 mentions this issue: [release-branch.go1.10] misc/cgo/testcshared: use file descriptor 30 for TestUnexportedSymbols

gopherbot · 2018-05-07T22:59:30Z

Change https://golang.org/cl/111996 mentions this issue: [release-branch.go1.9] misc/cgo/testcshared: use file descriptor 30 for TestUnexportedSymbols

…for TestUnexportedSymbols Backport of CL 108537 to 1.10 release branch. We were using file descriptor 100, which requires the Linux kernel to grow the fdtable size. That step may sometimes require a long time, causing the test to fail. Switch to file descriptor 30, which should not require growing the fdtable. Updates #23784 Fixes #25277 Change-Id: I9d25986f3b59bdeb04aa52407b24aa94712aedff Reviewed-on: https://go-review.googlesource.com/111995 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

…or TestUnexportedSymbols Backport of CL 108537 to 1.9 release branch. We were using file descriptor 100, which requires the Linux kernel to grow the fdtable size. That step may sometimes require a long time, causing the test to fail. Switch to file descriptor 30, which should not require growing the fdtable. Updates #23784 Fixes #25278 Change-Id: I19ea6ab1724ec1807643d5111c44631e20be76b0 Reviewed-on: https://go-review.googlesource.com/111996 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

danicat · 2018-07-31T23:57:17Z

Hi everyone! I've just downloaded the go code and ran the all.bash script to compile without any changes and got the following error (that resembles the original message):

##### ../misc/cgo/testcshared
--- FAIL: TestUnexportedSymbols (0.64s)
    cshared_test.go:353: run: [go build -buildmode=c-shared -installsuffix testcshared -o libgo2.so libgo2]
    cshared_test.go:367: run: [gcc -fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build783986620=/tmp/go-build -gno-record-gcc-switches -I pkg/linux_amd64_shared -o testp2 main2.c -Wl,--no-as-needed libgo2.so]
    cshared_test.go:373: run: [./testp2]
    cshared_test.go:376: # NAN ChangeLog
**
FAIL
2018/07/31 23:09:06 Failed: exit status 1

I've tried two other times and the message on the last line (after "cshared_test.go:376") changed all times, suggesting some random values were being read from the fd 30. I've recompiled the binaries to use fd 100 like it was originally and I got the tests to pass.

go env:

$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/danielapetruzalek/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/danielapetruzalek/go"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build074359831=/tmp/go-build -gno-record-gcc-switches"

I'm testing Go on the new Google Pixelbook.

I wonder if instead of choosing an arbitrary number we could have a more safe way to select the fd? Or use another technique to run this test?

ianlancetaylor · 2018-08-01T00:16:03Z

@danicat This issue is closed. Please open a new issue for that problem. Thanks.

danicat · 2018-08-01T00:52:11Z

Sure @ianlancetaylor. Done!

ianlancetaylor added this to the Go1.10 milestone Feb 12, 2018

ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure. labels Feb 12, 2018

ianlancetaylor modified the milestones: Go1.10, Go1.11 Feb 13, 2018

gopherbot closed this as completed in 3773cbb Feb 14, 2018

aclements reopened this Apr 20, 2018

ianlancetaylor added the release-blocker label Apr 20, 2018

gopherbot closed this as completed in 0a129db Apr 21, 2018

gopherbot mentioned this issue May 7, 2018

build: misc/cgo/testcshared failure [1.10 backport] #25277

Closed

gopherbot mentioned this issue May 7, 2018

build: misc/cgo/testcshared failure [1.9 backport] #25278

Closed

danicat mentioned this issue Aug 1, 2018

misc/cgo/testcshared: test failure on Google Pixelbook #26730

Closed

golang locked and limited conversation to collaborators Aug 1, 2019

gopherbot added the FrozenDueToAge label Aug 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: misc/cgo/testcshared failure #23784

build: misc/cgo/testcshared failure #23784

Invizory commented Feb 12, 2018

ianlancetaylor commented Feb 12, 2018

ianlancetaylor commented Feb 12, 2018

Invizory commented Feb 12, 2018 •

edited

ianlancetaylor commented Feb 12, 2018

Invizory commented Feb 12, 2018 •

edited

ianlancetaylor commented Feb 13, 2018

Invizory commented Feb 13, 2018 •

edited

ianlancetaylor commented Feb 13, 2018

Invizory commented Feb 13, 2018

Invizory commented Feb 13, 2018

ianlancetaylor commented Feb 13, 2018

Invizory commented Feb 13, 2018 •

edited

gopherbot commented Feb 14, 2018

ianlancetaylor commented Feb 14, 2018

Invizory commented Feb 14, 2018

aclements commented Apr 20, 2018

ianlancetaylor commented Apr 20, 2018

gopherbot commented Apr 20, 2018

FiloSottile commented May 7, 2018

gopherbot commented May 7, 2018

gopherbot commented May 7, 2018

gopherbot commented May 7, 2018

danicat commented Jul 31, 2018

ianlancetaylor commented Aug 1, 2018

danicat commented Aug 1, 2018

build: misc/cgo/testcshared failure #23784

build: misc/cgo/testcshared failure #23784

Comments

Invizory commented Feb 12, 2018

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

ianlancetaylor commented Feb 12, 2018

ianlancetaylor commented Feb 12, 2018

Invizory commented Feb 12, 2018 • edited

ianlancetaylor commented Feb 12, 2018

Invizory commented Feb 12, 2018 • edited

ianlancetaylor commented Feb 13, 2018

Invizory commented Feb 13, 2018 • edited

ianlancetaylor commented Feb 13, 2018

Invizory commented Feb 13, 2018

Invizory commented Feb 13, 2018

ianlancetaylor commented Feb 13, 2018

Invizory commented Feb 13, 2018 • edited

gopherbot commented Feb 14, 2018

ianlancetaylor commented Feb 14, 2018

Invizory commented Feb 14, 2018

aclements commented Apr 20, 2018

ianlancetaylor commented Apr 20, 2018

gopherbot commented Apr 20, 2018

FiloSottile commented May 7, 2018

gopherbot commented May 7, 2018

gopherbot commented May 7, 2018

gopherbot commented May 7, 2018

danicat commented Jul 31, 2018

ianlancetaylor commented Aug 1, 2018

danicat commented Aug 1, 2018

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

Invizory commented Feb 12, 2018 •

edited

Invizory commented Feb 12, 2018 •

edited

Invizory commented Feb 13, 2018 •

edited

Invizory commented Feb 13, 2018 •

edited