Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: many tests are flaky: Error while expecting an RSTStream frame: timeout waiting for frame #18111

Closed
dsnet opened this issue Nov 30, 2016 · 13 comments
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Nov 30, 2016

https://storage.googleapis.com/go-build-log/0b1b6d4c/openbsd-amd64-gce58_f9489e71.log

--- FAIL: TestServer_Request_Reject_Pseudo_scheme_invalid (2.02s)
	server_test.go:475: Error while expecting an RSTStream frame: timeout waiting for frame
--- FAIL: TestServer_Request_Connect_InvalidPath (2.03s)
	server_test.go:475: Error while expecting an RSTStream frame: timeout waiting for frame
FAIL
FAIL	golang.org/x/net/http2	44.630s
@dsnet dsnet added this to the Unplanned milestone Nov 30, 2016
@dsnet dsnet changed the title x/net/nettest: TestServer_Request_Connect_InvalidPath is flaky on openbsd x/net/http2: TestServer_Request_Connect_InvalidPath is flaky on openbsd Nov 30, 2016
@bradfitz bradfitz modified the milestones: Go1.9, Unplanned Nov 30, 2016
@bradfitz
Copy link
Contributor

/cc @tombergan

@dsnet dsnet added the Testing An issue that has been verified to require only test changes, not just a test failure. label Dec 8, 2016
@bradfitz bradfitz modified the milestones: Go1.8, Go1.9 Dec 8, 2016
@bradfitz
Copy link
Contributor

bradfitz commented Dec 8, 2016

We're seeing this across the board on basically on all GOOS now.

@bradfitz
Copy link
Contributor

bradfitz commented Dec 8, 2016

Closed dup #18235. See some debugging notes there.

@bradfitz bradfitz changed the title x/net/http2: TestServer_Request_Connect_InvalidPath is flaky on openbsd x/net/http2: many tests are flaky: Error while expecting an RSTStream frame: timeout waiting for frame Dec 8, 2016
@bradfitz
Copy link
Contributor

bradfitz commented Dec 8, 2016

With @tombergan's new debugging,

https://build.golang.org/log/b8e6726f6ff2fecdaebc69ebbd16e2f3c34a7ea4

--- FAIL: TestServer_Rejects_HeadersSelfDependence (2.07s)
	server_test.go:510: Error while expecting an RSTStream frame: timeout waiting for frame
	server_test.go:229: Framer read log:
		2016-12-08 17:00:48.731964688 Framer 0x3a776c00: read SETTINGS len=18, settings: MAX_FRAME_SIZE=1048576, MAX_CONCURRENT_STREAMS=250, MAX_HEADER_LIST_SIZE=1048896
		2016-12-08 17:00:48.732088456 Framer 0x3a776c00: read SETTINGS flags=ACK len=0
	server_test.go:235: Framer write log:
		2016-12-08 17:00:48.700756846 Framer 0x3a776c00: wrote SETTINGS len=0
		2016-12-08 17:00:48.73199436 Framer 0x3a776c00: wrote SETTINGS flags=ACK len=0
		2016-12-08 17:00:48.732123435 Framer 0x3a776c00: wrote HEADERS flags=END_STREAM|END_HEADERS|PRIORITY stream=1 len=21
FAIL
FAIL	golang.org/x/net/http2	24.016s

@bradfitz
Copy link
Contributor

bradfitz commented Dec 8, 2016

Oh, is this due to "http2: schedule RSTStream writes onto its stream's queue" (golang/net@00ed5e9) ?

@tombergan
Copy link
Contributor

Are you suggesting the stream is somehow closed before the RSTStream is flushed from the queue? A stream's queue is wiped when the stream is closed, so this is vaguely possible, although I haven't thought through the specific buggy sequence.

@tombergan
Copy link
Contributor

If that's the problem, it might also be related to the remaining flake here:
#17243 (comment)

@bradfitz
Copy link
Contributor

bradfitz commented Dec 8, 2016

I haven't thought through it all yet either. But that's my hunch, since a bunch of tests (all involving RSTStream frames) are now failing where the tests had been stable for ages.

@tombergan
Copy link
Contributor

@bradfitz
Copy link
Contributor

bradfitz commented Dec 8, 2016

Yeah, that should probably work. Only thing I'd be concerned about is what else could happen in the meantime (between the .resetStream() call and the write of the frame and the delayed close stream) that would differ in behavior. For instance, which incoming frames would then see a not-closed frame and do something whereas before they'd get dropped or get more rejects?

Alternative, just for completeness (but not well-considered) is: the sc.writeSched.CloseStream(st.id) in closeStream could maybe go through its list of queued frames and promote some to the global queue instead of the per-stream queue. But that feels a little gross and violating the abstraction. But maybe it's lower risk?

Your fix sounds good, though. Want to send a CL? It probably doesn't need a test, since we have tons of failing builders already which are effectively the test. If the existing tests pass, that's a good sign.

@tombergan
Copy link
Contributor

Sure, I'll send a CL soonish.

@gopherbot
Copy link

CL https://golang.org/cl/34238 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/34495 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 15, 2016
Updates bundled x/net/http2 to git rev 1195a05d for:

    http2: fix incorrect panic
    https://golang.org/cl/34498

    http2: fix race in writePushPromise
    https://golang.org/cl/34493

    http2: speed up TestTransportFlowControl in short mode
    https://golang.org/cl/33241

    http2: don't flush a stream's write queue in sc.resetStream
    https://golang.org/cl/34238

    http2: allow Transport to connect to https://[v6literal]/ without port
    https://golang.org/cl/34143

    http2: log Framer reads and writes when a server test fails
    https://golang.org/cl/34130

Updates #18326
Updates #18273
Updates #18111
Updates #18248
Updates #18235

Change-Id: I18c7a297fc94d6a843284efcfc43e0fdab9b5f41
Reviewed-on: https://go-review.googlesource.com/34495
Run-TryBot: Chris Broadfoot <cbro@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@golang golang locked and limited conversation to collaborators Dec 15, 2017
c3mb0 pushed a commit to c3mb0/net that referenced this issue Apr 2, 2018
resetStream(st) queues a RST_STREAM frame and then calls closeStream(st).
Unfortunately, closeStream(st) flushes any writes pending on st, which
can drop the RST_STREAM that was just queued. (If we are lucky, the
RST_STREAM will fit in the send buffer and won't be dropped, however,
if we are unlucky the RST_STREAM will be dropped.)

I fixed this bug by removing closeStream(st) from resetStream. Instead,
closeStream happens after the RST_STREAM frame is written. This is a more
direct implementation of the diagram in RFC 7540 Section 5.1, which says
that a stream does not transition to "closed" until after the RST_STREAM
has been sent.

A side-effect is that the stream may stay open for longer than it did
previously (since it won't close until *after* the RST_STREAM frame is
actually written). Situations like the following are a problem:

- Client sends a DATA frame that exceeds its flow-control window
- Server returns streamError(ErrCodeFlowControl) from processData
- RST_STREAM is queued behind other frames
- Server process the request body and releases flow-control
- Client sends another DATA frame, this one fits in the flow-control
  window. Server should NOT process this frame.

To avoid the above problem, we set a bool st.resetQueued=true when
RST_STREAM is queued, then ignore all future incoming HEADERS and DATA
frames on that stream.

I also removed st.sentReset and st.gotReset, which were used to ignore
frames after RST_STREAM is sent. Now we just check if the stream is closed.

Fixes golang/go#18111

Change-Id: Ieb7c848989431add5b7d95f40d6d91db7edc4980
Reviewed-on: https://go-review.googlesource.com/34238
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

4 participants