New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: TestInterruptWithPanic_h2 flakes on ppc64le builder #17243
Comments
@aclements Can you run your flake finder tool on this? |
Looks like it's been flaking on more than just PPC:
Flake analysis isn't very helpful with this one, I'm afraid:
|
I can't reproduce this. I deflaked this test in 238247e but that commit message is misleading. It was only deflaking additions added in https://golang.org/cl/33099 and https://golang.org/cl/33103. It looks like the original flake code remains unchanged. In particular, the test handler does this: cst := newClientServerTest(t, h2, HandlerFunc(func(w ResponseWriter, r *Request) {
io.WriteString(w, msg)
w.(Flusher).Flush()
panic(panicValue)
}), func(ts *httptest.Server) { That means the Handler's http2 @tombergan, did you fix this by accident when you redid the http2 write scheduler? Or, can you imagine why we'd schedule a |
Here's one hypothesis. sc.resetStream constructs a FrameWriteRequest that doesn't have a stream field: Both the old and new write schedulers use the stream field to determine which queue gets the FrameWriteRequest: This means RST_STREAM frames will be added to the root queue (for control frames), which is processed before the stream queues. We could have this sequence of events:
The test will pass if step 3 happens before step 2, but will fail in the above order. Both the old and new write schedulers have this bug -- more precisely, the bug is in sc.resetStream. This bug can explain the flake. However, I can't explain why the test has stopped flaking. Perhaps something changed in the buffering code which made the above order much less likely? |
To be clear, I've never been able to reproduce it. And it hasn't happened many times I don't think. |
(In a meeting now, but your theory looks valid at first glance.) Do you want to prepare a fix? Or I can send you something? |
A bit busy today, but I can get to this tomorrow if you don't get there first. |
CL https://golang.org/cl/33238 mentions this issue. |
CL https://golang.org/cl/33241 mentions this issue. |
Updates x/net/http2 to x/net git rev 00ed5e9 for: http2: schedule RSTStream writes onto its stream's queue https://golang.org/cl/33238 Fixes #17243 Change-Id: I79cc5d15bf69ead28d549d4f798c12f4ee2a2201 Reviewed-on: https://go-review.googlesource.com/33241 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
Apparently golang/net@00ed5e9 wasn't enough. I still see a flake:
|
I'm going to deflake this a different way, without fixing the http2 issue for now. |
CL https://golang.org/cl/33332 mentions this issue. |
Updates #17243 Change-Id: Iaa737874e75fdac73452f1fc13a5749e8df78ebe Reviewed-on: https://go-review.googlesource.com/33332 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
TODO(bradfitz): file bug about the general problem and theory and then close this one. |
I'll repurpose this bug about the x/net/http2 behavior. It seems that even with our fix in https://golang.org/cl/33238 (and bundling it into std), this flaky test persists. Perhaps the problem is now in the Transport instead of the Server. Investigate for Go 1.9. |
That is, if the Transport sees a HEADERS frame on the wire and then "immediately" sees a RST_STREAM, is it possible for the Transport to report an error instead of its RoundTrip returning a non-nil *Response and nil error? |
@tombergan, I think between the Transport and Server changes, we might've fixed this one? |
If it hasn't been flaking, I say close it. |
Fixes golang/go#17243 Change-Id: I76f972f908757b103e2ab8d9b1701312308d66e5 Reviewed-on: https://go-review.googlesource.com/33238 Reviewed-by: Tom Bergan <tombergan@google.com>
The failure looks like this:
It's happened three times in the last three screens of build.golang.org:
https://build.golang.org/log/969151e16737eda8c8720b1c0c3cb6f0ddbe5025
https://build.golang.org/log/ae7a7294c3d83cc5584aabeec8bb4fe55b9e4832
https://build.golang.org/log/512a378cc11774b203aa0962c9818326eda0cd76
but I have the vague feeling it's been going on for a while.
The text was updated successfully, but these errors were encountered: