Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestClientRedirectContext is flaky #16049

Closed
eliasnaur opened this issue Jun 13, 2016 · 6 comments
Closed

net/http: TestClientRedirectContext is flaky #16049

eliasnaur opened this issue Jun 13, 2016 · 6 comments
Labels
FrozenDueToAge help wanted Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@eliasnaur
Copy link
Contributor

The TestClientRedirectContext test sometimes fail with this:

--- FAIL: TestClientRedirectContext (0.00s)
    client_test.go:317: url.Error.Err = dial tcp 127.0.0.1:60809: operation was canceled; want errRequestCanceled or errRequestCanceledConn
FAIL
FAIL    net/http    15.840s

Recent flake on the darwin/386 builder:

https://build.golang.org/log/618f21103437358fda63d77ffaf92dd8b581f6d2

@ianlancetaylor ianlancetaylor added this to the Go1.7Maybe milestone Jun 13, 2016
@ianlancetaylor ianlancetaylor added the Testing An issue that has been verified to require only test changes, not just a test failure. label Jun 13, 2016
@adg adg assigned bradfitz and adg Jun 14, 2016
@adg adg removed their assignment Jun 27, 2016
@bradfitz
Copy link
Contributor

I can't reproduce this on my Mac laptop even with thousands of runs, in normal mode or with -race (which randomizes goroutine scheduling).

The code also seems fine.

I wonder if the failure was some network flake?

In any case, punting to Go 1.8. Help wanted.

@bradfitz bradfitz modified the milestones: Go1.8, Go1.7Maybe Jun 27, 2016
@eliasnaur
Copy link
Contributor Author

Thank you for looking into this. FWIW, it happened recently on the darwin/arm builder:

https://build.golang.org/log/2a4f80a178c686db64e9bf563d8511ff3dfa0410

@gregory-m
Copy link
Contributor

I can constantly reproduce it on my Mac laptop (OS X 10.11.5). Using stress command

⇢ $ go test -gcflags= net/http -c                                                                                                                                      
⇢ $ stress  ./http.test -test.short -test.run TestClientRedirectContext
1514 runs so far, 0 failures
3007 runs so far, 0 failures

/var/folders/lw/kb0_gx6x43d6jv3mj4s4t6v00000gn/T/go-stress949703166
--- FAIL: TestClientRedirectContext (0.00s)
    client_test.go:317: url.Error.Err = dial tcp 127.0.0.1:53382: operation was canceled; want errRequestCanceled or errRequestCanceledConn
FAIL

But I can't produce any valuable stack trace.
If running with runtime.StartTrace() the test pass forever.

Any ideas how to get any valuable info?

@jaredborner
Copy link

i've found some things that may help here

there seems to be a race condition in Transport.getConn (transport.go:797), specifically between the go func wrapping t.dialConn (transport.go:839) and the select statement following it (transport.go:845)

in the test case, Transport.getConn is called twice with the context canceled between these calls

the first time:

  • t.dialConn returns a nil error
  • this nil error is returned from the select statement

the second time:

  • t.dialConn returns an operation was canceled error
  • usually the select statement doesn't see this and executes the <-req.Context().Done() case (returning errRequestCanceledConn)...
  • ... but sometimes the t.dialConn error makes it through the channel in time and is returned instead

the following added log statements highlight this.

passing:

client.go:257: calling transport roundtrip
transport.go:841: error from t.dialConn: <nil>
transport.go:853: returning from '<-dialc' case. err: <nil>
client.go:496: checking redirect
client_test.go:305: canceling
client.go:257: calling transport roundtrip
transport.go:841: error from t.dialConn: dial tcp 127.0.0.1:64091: operation was canceled
transport.go:871: returning from '<-req.Context().Done()' case
PASS

so even though errRequestCanceledConn is returned, the operation was canceled error is still present.

failing:

client.go:257: calling transport roundtrip
transport.go:841: error from t.dialConn: <nil>
transport.go:853: returning from '<-dialc' case. err: <nil>
client.go:496: checking redirect
client_test.go:305: canceling
client.go:257: calling transport roundtrip
transport.go:841: error from t.dialConn: dial tcp 127.0.0.1:64097: operation was canceled
transport.go:853: returning from '<-dialc' case. err: dial tcp 127.0.0.1:64097: operation was canceled
--- FAIL: TestClientRedirectContext (0.21s)
    client_test.go:319: url.Error.Err = dial tcp 127.0.0.1:64097: operation was canceled; want errRequestCanceled or errRequestCanceledConn

also, adding a small time.Sleep call between the go func wrapping the t.dialConn and the select statement makes this behavior much more reproducible for me (~50% of the time just using regular go test)

@bradfitz
Copy link
Contributor

bradfitz commented Jul 6, 2016

@jaredborner, thanks! That let me reproduce it. I sent out https://golang.org/cl/24748

@bradfitz bradfitz modified the milestones: Go1.7Maybe, Go1.8 Jul 6, 2016
@gopherbot
Copy link

CL https://golang.org/cl/24748 mentions this issue.

@golang golang locked and limited conversation to collaborators Jul 7, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

7 participants