Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/poll: ppc64le stuck waiting for semaphore #23111

Closed
tophj-ibm opened this issue Dec 12, 2017 · 9 comments
Closed

internal/poll: ppc64le stuck waiting for semaphore #23111

tophj-ibm opened this issue Dec 12, 2017 · 9 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@tophj-ibm
Copy link

Note: I haven't finished debugging this, but discussing with @laboger, we thought it is important enough to raise an issue for the beta. This issue involves docker + go master (including 1.10beta1), and at the moment I'm thinking it's a go issue because it only occurs on ppc64le, and docker + main dependencies don't have much architecture specific code.

What version of Go are you using (go version)?

go1.10beta1

Does this issue reproduce with the latest release?

Yes beta, No 1.9.2

What operating system and processor architecture are you using (go env)?

Ubuntu 16.04 host, Debian:Stretch container

GOARCH="ppc64le"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOHOSTARCH="ppc64le"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_ppc64le"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build571013459=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Testing upstream docker with go version 1.10beta1

Set GO_VERSION to go1.10beta1 in Dockerfile.ppc64le, and on ppc64le run
TESTFLAGS="-check.f DockerSuite.TestAttachClosedOnContainerStop" make test-integration

(This runs this specific test https://github.com/moby/moby/blob/d65ab869e8712d08fb94a5337b83df5d247bf25b/integration-cli/docker_cli_update_unix_test.go#L272)

What did you expect to see?

PASS

What did you see instead?

FAIL / panic (test timed out)

https://gist.github.com/tophj-ibm/4049252d8ad8227c8aad45598d08750a

The test, and a few other tests, get stuck at cmd.Wait(), waiting indefinitely for a semaphore.

We've narrowed the commit down to 382d492. But seeing as that is basically a check to make sure fd's are closed, makes me think there is an underlying issue somewhere with an fd not being closed correctly.

Again, I'm still trying to trace what is happening, but if you have any ideas, all help is appreciated.

All the go tests pass.

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Dec 13, 2017
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Dec 13, 2017
@ianlancetaylor
Copy link
Contributor

It's very unclear to me why anything around here would be ppc64le specific. Does this test pass on other architectures?

@crvv
Copy link
Contributor

crvv commented Dec 13, 2017

There is a goroutine got stuck at this line, a pty.Read() call. pty is set to blocking mode by an (*os.File).Fd() call.
https://github.com/moby/moby/blob/d65ab869e8712d08fb94a5337b83df5d247bf25b/integration-cli/docker_cli_attach_unix_test.go#L54
https://github.com/kr/pty/blob/95d05c1eef33a45bd58676b6ce28d105839b8d0b/pty_linux.go#L35

An (*os.File).Read() call on a blocking mode file will block (*os.File).Close().

On Linux, I guess this can be fixed by adding syscall.SetNonblock(f.Fd(), true) at somewhere.
If the file is not opened by os.OpenFile, setting non-block won't help.
https://github.com/kr/pty/blob/95d05c1eef33a45bd58676b6ce28d105839b8d0b/pty_freebsd.go#L11

I encountered this before so I opened #22939.

@ianlancetaylor
Copy link
Contributor

@crvv Thanks for the explanation. I think I see the problem and I will send a CL.

@gopherbot
Copy link

Change https://golang.org/cl/83715 mentions this issue: net, os: don't wait for Close in blocking mode

@tophj-ibm
Copy link
Author

I just tested the cl with commit c7a0e4b6ce and everything fails the same way. I should have mentioned that this test does pass on at least amd64, and I suspected go because we've had issues with syscalls being wrong in the past.

@ianlancetaylor
Copy link
Contributor

@tophj-ibm Thanks for trying it. I think the CL is still a good one, and it comes with a test case that it fixes, but I don't know what is happening here. Are you able to debug this further?

@tophj-ibm
Copy link
Author

tophj-ibm commented Dec 13, 2017

I'm still looking at it, yes. Also thanks for getting to this so fast!

gopherbot pushed a commit that referenced this issue Dec 14, 2017
Updates #7970
Updates #21856
Updates #23111

Change-Id: I0cd0151fcca740c40c3c976f941b04e98e67b0bf
Reviewed-on: https://go-review.googlesource.com/83715
Reviewed-by: Russ Cox <rsc@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/83995 mentions this issue: os: don't wait for Close if the File was returned by NewFile

gopherbot pushed a commit that referenced this issue Dec 14, 2017
os.NewFile doesn't put the fd into non-blocking mode.
In most cases, an *os.File returned by os.NewFile is in blocking mode.

Updates #7970
Updates #21856
Updates #23111

Change-Id: Iab08432e41f7ac1b5e25aaa8855d478adb7f98ed
Reviewed-on: https://go-review.googlesource.com/83995
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@tophj-ibm
Copy link
Author

just re-tested with both cls cherry-picked into 1.10beta1 and these fix the issue. thanks guys 🎉

I'm still going to look around because I'm not sure why the file would be blocking just on power and not on anything else, but I think it's safe to close this.

@golang golang locked and limited conversation to collaborators Dec 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

4 participants