Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: new TestDontCacheBrokenHTTP2Conn fails #35113

Closed
cuonglm opened this issue Oct 23, 2019 · 23 comments
Closed

net/http: new TestDontCacheBrokenHTTP2Conn fails #35113

cuonglm opened this issue Oct 23, 2019 · 23 comments

Comments

@cuonglm
Copy link
Member

cuonglm commented Oct 23, 2019

Build log: https://storage.googleapis.com/go-build-log/76c387a0/linux-amd64-race_356de3ff.log

--- FAIL: TestDontCacheBrokenHTTP2Conn (0.03s)
    transport_test.go:5885: for iteration 5, doBreak=false; unexpected error Get "https://127.0.0.1:35315": some write error
    transport_test.go:5889: GotConn calls = 0; want 1
FAIL
FAIL	net/http	8.568s
FAIL
2019/10/23 15:34:13 Failed: exit status 1

@cuonglm
Copy link
Member Author

cuonglm commented Oct 23, 2019

This happens when I test https://go-review.googlesource.com/c/go/+/202797.

Is that a known flaky one cc @bradfitz @bcmills

@bradfitz
Copy link
Contributor

Sigh

@bradfitz
Copy link
Contributor

This was just introduced in https://go-review.googlesource.com/c/go/+/202078 for #34978 (and backported to Go 1.13, sadly too early apparently).

Looking.

@bradfitz bradfitz self-assigned this Oct 23, 2019
@bradfitz bradfitz changed the title TestDontCacheBrokenHTTP2Conn fails net/http: new TestDontCacheBrokenHTTP2Conn fails Oct 23, 2019
@bradfitz
Copy link
Contributor

I can repro:

bradfitzlaptop http$ go test -v -run=TestDontCacheBrokenHTTP2Conn -race -count=50
=== RUN   TestDontCacheBrokenHTTP2Conn
--- PASS: TestDontCacheBrokenHTTP2Conn (0.04s)
=== RUN   TestDontCacheBrokenHTTP2Conn
--- PASS: TestDontCacheBrokenHTTP2Conn (0.04s)
=== RUN   TestDontCacheBrokenHTTP2Conn
--- PASS: TestDontCacheBrokenHTTP2Conn (0.05s)
=== RUN   TestDontCacheBrokenHTTP2Conn
--- PASS: TestDontCacheBrokenHTTP2Conn (0.04s)
=== RUN   TestDontCacheBrokenHTTP2Conn
    TestDontCacheBrokenHTTP2Conn: transport_test.go:5885: for iteration 5, doBreak=false; unexpected error Get "https://127.0.0.1:58689": some write error
    TestDontCacheBrokenHTTP2Conn: transport_test.go:5889: GotConn calls = 0; want 1
--- FAIL: TestDontCacheBrokenHTTP2Conn (0.03s)
panic: Log in goroutine after TestDontCacheBrokenHTTP2Conn has completed

goroutine 205 [running]:
testing.(*common).logDepth(0xc000192100, 0xc00002a460, 0x4c, 0x3)
	/Users/bradfitz/go/src/testing/testing.go:679 +0x67c
testing.(*common).log(...)
	/Users/bradfitz/go/src/testing/testing.go:661
testing.(*common).Errorf(0xc000192100, 0x1822436, 0x19, 0xc0001a5678, 0x1, 0x1)
	/Users/bradfitz/go/src/testing/testing.go:710 +0x90
net/http_test.TestDontCacheBrokenHTTP2Conn.func2(0x1813c27, 0x3, 0xc00047b1d0, 0xf, 0xc000238360, 0x1041a70, 0x1d25470, 0x1d25470)
	/Users/bradfitz/go/src/net/http/transport_test.go:5849 +0x123
net/http.(*Transport).dial(0xc000384000, 0x1917980, 0xc00046b860, 0x1813c27, 0x3, 0xc00047b1d0, 0xf, 0x0, 0x205, 0x0, ...)
	/Users/bradfitz/go/src/net/http/transport.go:1035 +0xf8
net/http.(*Transport).dialConn(0xc000384000, 0x1917980, 0xc00046b860, 0x0, 0xc0003b8500, 0x5, 0xc00047b1d0, 0xf, 0x0, 0x0, ...)
	/Users/bradfitz/go/src/net/http/transport.go:1457 +0x2b1c
net/http.(*Transport).dialConnFor(0xc000384000, 0xc0003746e0)
	/Users/bradfitz/go/src/net/http/transport.go:1300 +0x150
created by net/http.(*Transport).queueForDial
	/Users/bradfitz/go/src/net/http/transport.go:1269 +0x68b
exit status 2
FAIL	net/http	0.386s

@bcmills
Copy link
Contributor

bcmills commented Oct 23, 2019

and backported to Go 1.13, sadly too early apparently

Note that #35087 never actually got the CherryPickApproved label. I was kind of intentionally holding off on pinging it to give the CL a bit of soak time, but I guess I should have been more explicit about that.

@bradfitz
Copy link
Contributor

Soaking before cherry-picking is new to me. That's a thing we do now? I mean, not a terrible idea, but first I'd heard.

@bcmills
Copy link
Contributor

bcmills commented Oct 23, 2019

It's a thing I've been doing informally, at least. (#31887) Not a formal policy, as far as I'm aware.

@bcmills
Copy link
Contributor

bcmills commented Oct 23, 2019

At any rate: for the race condition!

Would it make sense to add synchronization between the Dial hook and the loop, so that we know that all of the Conns returned by Dial have been Closed before we move to the next iteration? I suspect that the problem here is the aliasing on brokenState.

@cuonglm
Copy link
Member Author

cuonglm commented Oct 24, 2019

@bradfitz not sure it's related, but if I apply this patch, the test won't fail even with -count=500:

diff --git a/src/net/http/transport_test.go b/src/net/http/transport_test.go
index f76530b8fa..b60c164630 100644
--- a/src/net/http/transport_test.go
+++ b/src/net/http/transport_test.go
@@ -5871,9 +5871,7 @@ func TestDontCacheBrokenHTTP2Conn(t *testing.T) {
                        TLSHandshakeDone: func(cfg tls.ConnectionState, err error) {
                                brokenState.Lock()
                                defer brokenState.Unlock()
-                               if doBreak {
-                                       brokenState.broken = true
-                               }
+                               brokenState.broken = doBreak
                        },
                })
                req, err := NewRequestWithContext(ctx, "GET", cst.ts.URL, nil)

@cuonglm
Copy link
Member Author

cuonglm commented Oct 25, 2019

@bradfitz not sure it's related, but if I apply this patch, the test won't fail even with -count=500:

diff --git a/src/net/http/transport_test.go b/src/net/http/transport_test.go
index f76530b8fa..b60c164630 100644
--- a/src/net/http/transport_test.go
+++ b/src/net/http/transport_test.go
@@ -5871,9 +5871,7 @@ func TestDontCacheBrokenHTTP2Conn(t *testing.T) {
                        TLSHandshakeDone: func(cfg tls.ConnectionState, err error) {
                                brokenState.Lock()
                                defer brokenState.Unlock()
-                               if doBreak {
-                                       brokenState.broken = true
-                               }
+                               brokenState.broken = doBreak
                        },
                })
                req, err := NewRequestWithContext(ctx, "GET", cst.ts.URL, nil)

The test still fails at -count=1000

@cuonglm
Copy link
Member Author

cuonglm commented Oct 26, 2019

@bcmills
Copy link
Contributor

bcmills commented Oct 31, 2019

linux-386-clang: https://build.golang.org/log/9c05e68c70e7bf8388daf62daf8e777c4ccec143

--- FAIL: TestDontCacheBrokenHTTP2Conn (0.02s)
    transport_test.go:5889: for iteration 5, doBreak=false; unexpected error Get "https://127.0.0.1:45761": some write error
    transport_test.go:5893: GotConn calls = 0; want 1
    transport_test.go:5853: unexpected Dial error: dial tcp 127.0.0.1:45761: connect: connection reset by peer

@bcmills bcmills closed this as completed Nov 12, 2019
@bcmills bcmills reopened this Nov 12, 2019
@gopherbot
Copy link

Change https://golang.org/cl/206819 mentions this issue: net/http: add some debugging to TestDontCacheBrokenHTTP2Conn

gopherbot pushed a commit that referenced this issue Nov 13, 2019
Not a fix, but will give us more info when it flakes again.

Updates #35113

Change-Id: I2f90c24530c1bea81dd9d8c7a59f4b0640dfa4c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/206819
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@bcmills
Copy link
Contributor

bcmills commented Jan 22, 2020

@bradfitz, any thoughts on how we should address this?

@bradfitz
Copy link
Contributor

No immediate thoughts. I need to get it all in my head again and debug but I've been busy with other stuff. I'll try work on it this week.

@gopherbot
Copy link

Change https://golang.org/cl/218097 mentions this issue: net/http: fix TestDontCacheBrokenHTTP2Conn flaky

@bcmills
Copy link
Contributor

bcmills commented Feb 20, 2020

2020-02-20T03:27:46-715e8bb/windows-amd64-longtest
2020-02-19T21:53:23-8a8adc2/darwin-amd64-10_11
2020-02-19T21:53:23-8a8adc2/windows-amd64-race
2020-02-19T21:18:17-2f2e97c/linux-amd64-race
2020-02-19T21:00:02-1bf9d3c/windows-amd64-race
2020-02-05T22:19:16-e5b9c10/windows-amd64-race
2020-02-05T20:54:27-88ae4cc/windows-amd64-race
2020-01-31T18:50:59-d91c3bc/windows-amd64-race
2020-01-29T16:34:44-2e4f490/aix-ppc64
2020-01-29T04:04:52-c436ead/windows-amd64-race
2020-01-27T23:23:00-22f09ce/windows-amd64-2016
2020-01-23T21:01:12-ace25f8/linux-mips64le-mengzhuo
2020-01-23T16:52:00-64378c2/linux-amd64-race
2020-01-23T14:36:57-82a2f82/linux-amd64-race
2020-01-17T17:38:40-316fd8c/linux-mips64le-mengzhuo
2020-01-15T04:21:28-a52db64/openbsd-amd64-64
2020-01-08T20:07:45-817afe8/windows-amd64-race
2020-01-07T02:56:05-edf3ec9/plan9-arm
2020-01-06T16:23:27-e8729a7/windows-amd64-race
2020-01-06T11:46:56-693748e/windows-amd64-longtest
2020-01-03T21:09:16-9c3869d/linux-mips64le-mengzhuo
2020-01-02T23:00:12-a65f088/darwin-amd64-race
2020-01-02T17:36:29-bf26847/darwin-arm64-corellium
2019-12-23T23:29:48-372efbb/freebsd-arm-paulzhol
2019-12-20T23:07:52-4d5bb9c/linux-amd64-nocgo
2019-12-19T17:22:29-a197c7b/darwin-amd64-10_11
2019-12-13T20:41:04-7d30af8/windows-amd64-longtest
2019-12-12T19:23:45-22d28a2/darwin-arm64-mn4m2zdaios
2019-12-11T19:51:34-1b1fbb3/darwin-amd64-race
2019-12-09T16:41:15-a3a630b/freebsd-amd64-race
2019-12-06T23:12:11-df0ac45/darwin-amd64-nocgo
2019-12-05T17:23:50-50535e6/linux-mips64le-rtrk
2019-12-03T15:26:50-2ac1ca9/android-386-emu
2019-12-02T16:11:24-3edd1d8/netbsd-amd64-8_0

@cuonglm
Copy link
Member Author

cuonglm commented Feb 21, 2020

@bcmills is there anyone I can add to reviewers list?

@bradfitz seems to be busy at this time.

@bcmills
Copy link
Contributor

bcmills commented Feb 21, 2020

I'm looking for another reviewer. If I can't find one I'll dig into the code a bit more and see if I can get comfortable enough to +2 it myself.

@rsc
Copy link
Contributor

rsc commented Feb 25, 2020

I'm looking at this and the earlier CL.

@gopherbot
Copy link

Change https://golang.org/cl/220905 mentions this issue: net/http: fix handling of HTTP/2 upgrade failures

@bakul
Copy link

bakul commented Feb 26, 2020

FWIW, I see the same test failing on Mojave (as part of running all.bash on 1.4).

    --- FAIL: TestDontCacheBrokenHTTP2Conn (0.01s)
        transport_test.go:6079: for iteration 5, doBreak=false; unexpected
    error Get "https://127.0.0.1:63138": some write error
        transport_test.go:6083: GotConn calls = 0; want 1
    panic: Fail in goroutine after TestDontCacheBrokenHTTP2Conn has completed

    goroutine 3645 [running]:
    testing.(*common).Fail(0xc0003d9200)
            /usr/local/go/src/testing/testing.go:613 +0x11a
    testing.(*common).Errorf(0xc0003d9200, 0x1585951, 0x19, 0xc0002b5920, 0x1,
    0x1)
            /usr/local/go/src/testing/testing.go:712 +0x90
    net/http_test.TestDontCacheBrokenHTTP2Conn.func2(0x157700a, 0x3,
    0xc0005bacb0, 0xf, 0xc0004519a8, 0xc00003b000, 0xc0004519a8, 0x10f65fe)
            /usr/local/go/src/net/http/transport_test.go:6044 +0x10a
    net/http.(*Transport).dial(0xc0004a32c0, 0x165aa60, 0xc000692990,
    0x157700a, 0x3, 0xc0005bacb0, 0xf, 0x0, 0x120, 0x0, ...)
            /usr/local/go/src/net/http/transport.go:1088 +0x7b
    net/http.(*Transport).dialConn(0xc0004a32c0, 0x165aa60, 0xc000692990, 0x0,
    0xc000283020, 0x5, 0xc0005bacb0, 0xf, 0x0, 0xc0002eb8c0, ...)
            /usr/local/go/src/net/http/transport.go:1519 +0x19ee
    net/http.(*Transport).dialConnFor(0xc0004a32c0, 0xc0007c0630)
            /usr/local/go/src/net/http/transport.go:1365 +0xc6
    created by net/http.(*Transport).queueForDial
            /usr/local/go/src/net/http/transport.go:1334 +0x3fe
    FAIL    net/http        0.732s

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants