Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: apparent deadlock in TestSOCKS5Proxy on plan9-arm builder #38769

Closed
bcmills opened this issue Apr 30, 2020 · 6 comments
Closed

net/http: apparent deadlock in TestSOCKS5Proxy on plan9-arm builder #38769

bcmills opened this issue Apr 30, 2020 · 6 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Plan9 WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Apr 30, 2020

#!watchflakes
post <- builder == "plan9-arm" && pkg == "net/http" && test == "TestSOCKS5Proxy"

This looks like a possible deadlock involving the network poller on Plan 9:

goroutine 4622 [chan receive, 17 minutes]:
internal/poll.(*asyncIO).Wait(...)
	/boot/workdir/go/src/internal/poll/fd_io_plan9.go:84
internal/poll.(*FD).Read(0x10e5d730, 0x10daeca8, 0x111bc000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
	/boot/workdir/go/src/internal/poll/fd_plan9.go:73 +0x124
net.(*netFD).Read(0x10e5d730, 0x111bc000, 0x8000, 0x8000, 0x111bc000, 0x4b, 0x8000)
	/boot/workdir/go/src/net/fd_plan9.go:86 +0xb0
net.(*conn).Read(0x11097078, 0x111bc000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
	/boot/workdir/go/src/net/net.go:182 +0x6c
io.copyBuffer(0x4c14a0, 0x11097080, 0x10b400b0, 0x11097078, 0x111bc000, 0x8000, 0x8000, 0x10c1aa48, 0x10b47070, 0x3ff, ...)
	/boot/workdir/go/src/io/io.go:409 +0xd8
io.Copy(...)
	/boot/workdir/go/src/io/io.go:368
net.genericReadFrom(0x4c0ca8, 0x11097000, 0x10b400b0, 0x11097078, 0x3, 0x10e49a00, 0x10b50f70, 0x6e1a4)
	/boot/workdir/go/src/net/net.go:639 +0x7c
net.(*TCPConn).readFrom(...)
	/boot/workdir/go/src/net/tcpsock_plan9.go:14
net.(*TCPConn).ReadFrom(0x11097000, 0x10b400b0, 0x11097078, 0x10b47070, 0x11097000, 0x10b40001, 0x3f048)
	/boot/workdir/go/src/net/tcpsock.go:103 +0x44
io.copyBuffer(0x4c0ca8, 0x11097000, 0x10b400b0, 0x11097078, 0x0, 0x0, 0x0, 0x0, 0xf, 0x0, ...)
	/boot/workdir/go/src/io/io.go:395 +0x2ec
io.Copy(...)
	/boot/workdir/go/src/io/io.go:368
net/http_test.TestSOCKS5Proxy.func1(0x10efccc0)
	/boot/workdir/go/src/net/http/transport_test.go:1280 +0x930
created by net/http_test.TestSOCKS5Proxy.func3
	/boot/workdir/go/src/net/http/transport_test.go:1302 +0x94

2020-04-28T21:46:16-b1b6784/plan9-arm
2020-04-27T16:18:30-f1a2a0e/plan9-arm
2020-04-24T01:48:48-f5fcc9b/plan9-arm
2020-04-21T00:09:00-eec981e/plan9-arm
2020-04-11T16:20:57-b78109e/plan9-arm
2020-03-15T08:14:24-dc32553/plan9-arm
2020-02-24T15:55:01-90c71ce/plan9-arm
2020-02-22T04:31:20-f9c51e3/plan9-arm
2020-02-13T16:25:03-e237df5/plan9-arm
2020-01-19T14:04:09-8e0be05/plan9-arm
2020-01-02T22:26:56-73b657e/plan9-arm
2019-11-25T23:37:03-87805c9/plan9-arm
2019-11-15T22:47:41-72f333a/plan9-arm
2019-11-15T21:02:07-d856e05/plan9-arm
2019-11-12T05:35:33-99957b6/plan9-arm
2019-11-01T21:32:33-08a72c8/plan9-arm
2019-10-24T13:58:59-05ee506/plan9-arm
2019-10-16T00:05:37-c4817f5/plan9-arm
2019-10-10T22:38:38-03ef105/plan9-arm
2019-10-07T18:50:14-a0894ea/plan9-arm
2019-10-04T21:22:55-5f4aa5d/plan9-arm
2019-09-21T21:01:41-9c0e56b/plan9-arm
2019-09-20T05:02:16-66e562c/plan9-arm
2019-09-18T18:32:20-e70e0a6/plan9-arm
2019-09-16T21:37:16-0e0bff8/plan9-arm
2019-08-28T18:38:52-7ed973b/plan9-arm
2019-08-06T11:13:05-546ea78/plan9-arm
2019-07-29T20:29:54-7b8234b/plan9-arm
2019-06-14T18:41:07-9838f4d/plan9-arm
2019-06-05T23:57:39-e9073ef/plan9-arm
2019-06-05T21:21:59-fc70527/plan9-arm
2019-04-29T17:42:27-08b956f/plan9-arm

CC @millerresearch @ianlancetaylor @0intro

@bcmills bcmills added OS-Plan9 NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Apr 30, 2020
@bcmills bcmills added this to the Unplanned milestone Apr 30, 2020
@millerresearch
Copy link
Contributor

millerresearch commented May 20, 2020

This might not be specific to TestSOCKS5Proxy. There are similar looking http deadlocks where that test is not running, for example today.

But internal/poll/fd*_plan9.go still seems to be involved here too.

@millerresearch
Copy link
Contributor

Here's a possibly related clue: I'm monitoring a test machine now which seems to have deadlocked in the x/net/http2 test, and the last message on the system console is:

Fsnewcall: incall queue full (64) on port 39931

It may be that too many simultaneous connection attempts are causing some to be dropped. I would have expected an error condition that the test would notice, not a deadlock, but I'll see if increasing the queue size helps.

@millerresearch
Copy link
Contributor

I'll see if increasing the queue size helps.

It didn't.

@millerresearch
Copy link
Contributor

Looking at internal/poll/fd*_plan9.go , I think there are at least three potential data races when a network i/o is being started and cancelled concurrently in different threads. An example of this situation, seen in the logs cited above, is an http server calling http.(*connReader).abortPendingRead to cancel an i/o which is being started by http.(*connReader).startBackgroundRead.

  1. Cancelling a read by setting a deadline in the past: (*FD).Read checks fd.rtimedout and finds it false, so it calls newAsyncIO to launch a goroutine to perform the i/o. Now setDeadlineImpl sets fd.rtimedout to true (too late), checks fd.raio and finds the i/o goroutine hasn't been launched, so it returns without cancelling it. Then newAsyncIO launches the i/o goroutine, and returns to Read which sets fd.raio (too late). Meanwhile the i/o goroutine starts a read syscall which may never terminate (that's why it needed to be cancelled). The callers of Read and SetReadDeadline both deadlock waiting for the read to end.

  2. Deadline expiry while async i/o is being launched: setDeadlineImpl registers a timer function whose purpose is to set fd.rtimedout and cancel the i/o goroutine if it's running. On a heavily loaded system, the i/o goroutine may be created after a delay, just as the timer interval expires, exposing the same races on fd.rtimedout and fd.raio as in 1. above.

  3. Signalling the async i/o goroutine before the read syscall is executed: in this sequence setDeadlineImpl, or its timer expiry function, observes that fd.raio is set, and calls (*asyncIO).Cancel to send a hangup signal to the OS process running the i/o goroutine. The intention is to interrupt the read syscall so control can return to the i/o goroutine, allowing that to send a finish message to (*asyncIO).Wait. But if the hangup arrives before the syscall instruction has been executed, there's nothing to interrupt: the runtime.sighandler function will simply ignore the signal (because of runtime.ignoreHangup), and the read syscall will then proceed and possibly never terminate.

@gopherbot
Copy link

Change https://golang.org/cl/235820 mentions this issue: internal/poll: add mutex to prevent SetDeadline race in Plan 9

gopherbot pushed a commit that referenced this issue Jun 2, 2020
There are data races on fd.[rw]aio and fd.[rw]timedout when Read/Write
is called on a polled fd concurrently with SetDeadline (see #38769).
Adding a mutex around accesses to each pair (read and write) prevents
the race, which was causing deadlocks in net/http tests on the builders.

Updates #38769.

Change-Id: I31719b3c9a664e81a775cda583cff31c0da946c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/235820
Run-TryBot: David du Colombier <0intro@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David du Colombier <0intro@gmail.com>
@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Mar 16, 2023
@gopherbot
Copy link

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@golang golang locked and limited conversation to collaborators Apr 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Plan9 WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
Status: Done
Development

No branches or pull requests

3 participants